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ABSTRACT 



This cJocutnent is the final report of a study desipa-d to invesMgdte 
ttie performance of the Brosd-Range Tailored Test of Verbal Ability. The 
Broad-Range Tailored T.st (BRTT) is a cumputerized adaptive rost developed 
oy trederlc Lord. It employs a mAximun Hkelihood selection strategy 
to choo3e items from an item pool stored on magnetic- disk. The 'tems 
seleot-d are tailored to the individual testee and are presented'^on a 
computer terminal. Each testee responds to 25 iteris; ar the conclusion 
or the test the computer calc.ilat^-s ?, verbal abilit" scor« for the 
individual. Tl,e test was designed to yield a verbal ability score from 
the fifth grade level Co th.> grad.Mte school level. 

Perforrianct' of the BRTT had been investigated by means ot simulation 
b.tuc3ies. liie current study is the first empirical test of its performance. 
Two torns of the BRTT were administered to 146 high ^school students. The 
students also answered a posttest quest ioanaire in which they indicated 
their reactions to this form of testirg. 

Analyses revealed that the BRTF was nore reliable the PSAT for 

a number of scores derived frorti the data. The test-retest reliability of 
th- BRTT w^;s .S719 the 25th iten; riliabilicy of the PSAT verbal score 
(scaled down to 25 irens) was .65. Analyses of the reliability of the 
bRTT vs. thi PSAT revealed that the tailored test was also more reliable 
than the conventional test at shorter lengths. Correlations between 
scores on the BRTT and PSAT were reasonably high— t ypica 1 ly about ,86. 
This finding confirns ther,ret i cal expectation regarding the Increased 
f^fficiency of adaptive, as compared to conventional tests. 

T!)c sstudv investigated nine of observed scores and score 
transformations. Th-r. t„t,j,t useful of these was found to be the expected 
proportion correct tiver the entirt- item pool. This score was highly 
reliable and was found to be parallel with respect to the mean values 
across forns A and B. , thf^ commonly-employed latent-trait parameter 

and ., , a nonotone t ransf or.nat ion of , did not exhibit this characteristic. 

Th'^ information functions of the BRTT were calculated and compared 
favorably with sinulatfon rt.";iilts previously reported by Lord. Thus the 
acciTdcy of Chi BRTT was in .tccord with theoretical expectation. 

Student response to tlie comj)ti te r i zed testing procedure was generally 
q'jLto favorable. Students found tho human-computer interface e.irty to 
use and 1( ss fatiguing than a long pencil-and-paper test. 

Operationally, the system performed well. Detailed analysis of tl 
anomalous caSes, suggested reflnencnts to the system. Response time was 
adequate ;u\<i c<HisIstent. Reliibilify of the har<iware and software was 
excellent. Tlu-se resutts suggest :\iAt i orputerized adaptive testing is 
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ready to take the first steps out of the laboratory environment and find 
its plac^ in the educational community. 

T'le recommendations emerging from this study are! (1) that the 
organization collaborate \^itb an interested client to develop an adaptive 
tt^st for use in an edjcatioral settiiig; (2) that the po*:ential for micro- 
processor-based systems for the delivery of adaptive tc^^ ing be avaluated; 
(3^ that extensions to item response theory and the development of alterna- 
tive models for the provision of adaptive testing be explored; and (A) th*:t 
high priority be accorded the development of innovative ass;es;sment strategics 
for computer presentation. Such items might involve simulation and gaming, 
c:)ns true ted responses, graphics, mot ion, sound , and t ime-dependent responses* 
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Chapter I 
Background of the Study 

1 * 1 P urpose of the Study 

As a major testing organization^ Educational Testing Service has a 
long-standing Interest In and commitment to Improving the testing process^ 
Although the organization uses computer technology to support the admlnl- 
str^itlve aspects of Its testing programs (such as candidate registration^ 
It^itD analysis » scoring* and reporting) » there has been little use of the 
computer as a vehicle for presenting test ittims* 

Although the prospect of employing computers as testing devices has 
Intrigued psychometrlclans for over a decade (Welss» 1973)» two considerations 
have militated against computerized testing: the first obstacle was the 
high cost associated with this technology; the second was the lack of an 
adequate psychometric theory to support adaptive testing* In recent 
years » both obstacles have become less problematic « 

The development of microelectronic technology has radically reduced 
the cost of computer hardware^ and forecasts predict this trend to continue 
for a number of years (Noyce> 1977)* It seems likely that computers will 
soon be as readily accessible as telephones « 

The development of Item response theory (Lord and HovlcV» 1968) 
provided a psychometric foundation on which adaptive tests could be 
erected* A number of Investigators have developed adaptive testing 
models and explored their performance In both simulation and empirical 
studies (McBrlde» 1976)* The convergence of psychometric and technical 
developments suggests the feasibility of practical computerized adaptive 
testing* 

One of the most promising of the adaptive testing models recently 

1 



ran 
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ceveloped is Lord's (1977a) Broad-Range Tailored Test of Verbal Ability 
(BRTT) • 

The Broad-Range Tailored Test of Verbal Ability employs item pool 
stratified into difficulty levels and arranged by item type within 
difficulty* The BRTT yields an ability score appropriate to students 
from the fifth grade level to graduate school* McBri<5e has characterized 
the BRTT as "the most ambitious adaptive testing proposal to appear in 
the literature* by virtue of the range of ability over which Lord intended 
it to be used (HcBride* 1976* p* 54)"* 

-In designing the BRTT* Lord investigated about thirty desigas for a 
broad-r?nge tailored test administering each to approximately ICOO simulated 
examinees* The final design is described In detail in section 2*1* 

Lord (1977a) suggested that the appropriate next step in the evalua^ 
tion of the BRTT would be an empirical study of its performance when 
administered to actual (rather than simulated) examinees* The present 
study was designed to explore the empirical performance of the BRTT* 

The present study involved two phases* The first was to design and 
implement a computer system capable of administering the BRTT* The 
second phase involved administering the two forms of the BRTT to H6 high 
school students* The students' responses were analyzed to determine 
their relationship to theoretical expectations* 

1.2 Historical Antecedents of Computerized Adaptive Testing 

Weiss (1976) has traced the history of adaptive testing to Binet's 
procedures for the assessment of Intellectual functioning* Binet*s 
procedures were characterized by three aspects that are typical of 
contemporary adaptive tests: 
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1. Prior information Is used to select a starting point for the 
assessment procedure. The tester determines the Initial Item 
based on his or her judgment of the testee's ability. 

2. The Items presented depend* in part> upon the testee's responses 
to previous Items. Basal and celling levels are used to ensure 
that most items are within the appropriate range of ''fflculty 
and are neither too difficult nor too easy for the individual. 

3. A stopping rule is employed to determine th*. point at which the 
adn.inlstratlon of items ceases. Thus* Individual testees may 
receive tests of different lengths. 

Although individually administered tests may be subject to some 
distortion owing to testee-examlner interaction effects (Rosenthal* 1966; 
Wlckes> 1956)* this disadvantage is balanced by the examiner's ability to 
maintain rapport > clarify ambiguities in items or responses > record 
response latencies* probe responses of Interest* and* generally* manipulate 
the conditions of test administration to obtain information yield. Thus* 
the individually administered test has the potential to elicit considerable 
information about the examinee. Because of their high yield* these tests 
are often employed as clinical Instruments (Harrison* 1965). Unfortunately* 
individualised administration is too costly and time-consundng a process 
to be employed in many assessment situations. 

The inefficiency of individually administered testing has created a 
need to adopt less time-consuming and less expensive methods when large 
numbers of persons are to be tested. The group-^adminlstered standardized 
paper-and-pencll test has become the accepted compromise between the 



desirability of individual assessment and the need for efficient testing 
of large numbers of people (McBride, 1976). 

The need for standardized group-admlnlstered tests was recognized 
prior to World War I. By 1918, the need for rapid classification of 
recruits had spurred the development of the Army Alpha and Army Beta 
tests. Initiating a period of tremendous growth In group testing (Weiss 
and Betz, 1973). 

Group tests have a number of Important advantages over Individual 
tests. Among these are the following: 

1. Lower cost to administer and score since a number of Individuals 
can be tested by a single administrator and machine scoring of 
answer sheets is possible. 

2. Reduction In examiner-effect variables due to reduction In 
relationship factors; less reliance on examiner's judgment In 
scoring. 

3. Comparisons among testees are facilitated because each Individual 
receives the same set of items under uniform conditions . 

Despite Its economic and procedural advantages, the group-admlnlstered 
objective multiple-choice test Is far from an Ideal testing Instrument, 
and psychologists have been Intrigued by the potential utility of assess- 
ment procedures that adapt to the Individual. 

Hutt (1947) compared "consecutive" with "adaptive" administration oif 
the Stanford-Blnet. In the adaptive technique, he administered an easier 
Item following an Incorrect response and a more difficult item following 
a correct response. Hutt found that students who had poor school adjust- 
ment obtained reliably higher TQ scores in the adaptive modality. 
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Hlclc (1951) employee Shannon's model of Information to develop an 
"up-down" technique In which testees would receive Items for which they 
had a 50% probability of choosing che correct answer, nils procedure „as 
Intended to obtain maximum Information from the Item responses. The 
notion of information yield Is a central one In current adaptive testing 
strategies. 

One form of adaptive test which can be administered with paper and 
pencil is chat in which several "peaked" tests are created, each with 
overlapping ranges. An examinee „ho scores in the extreme range of a 
test is retested with a more appropriate instrument. 

One strategy, called the two-stage adaptive test, involves administer- 
ing a short "routing test" to all examinees, who are then directed to an 
appropriate second-stage test with items of relatively homogeneous 
difficulty (McBride, 1976). 

A two-stage adaptive strategy was tried experimentally by Angoff and 
Huddleston (1958), who concluded that although the use of two narrow-range 
(peaked) tests was slightly more reliable than a single broad-range test 
(.89 vs .85), the Increase in validity coefficient for the two-stage 
procedure would not exceed .02 on the average. One problem with a 
two-stage adaptive strategy is that errors of measurement would cause 
some testees to be misclassified by the routing test and to be routed to 
an inappropriate second-stage test. Angoff and Huddleston felt that the 
numbers of such misclassified students, although small, would be sufficient 
to cause serious administrative problems and that the advantage of 
heightened reliability "would not be great enough to warrant changing 
to the administratively more complex two-level test system" (Angoff and 
Huddleston, 1958, p. 4). 
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Thls strategy has also been Investigated by Eetz and Weiss (1974) 
and Vale (1975)* 

Lord (1971) proposed a paper-and-pencl 1 branching test which he 
termed a flexilevel test* A flexllevel test of length k contains 2k - I 
items* Testees begin at the item of middle difficulty and are branched 
to an item of greater difficulty following a correct response ot to an 
easier item following an Incorrect response* 

There is some evidence that paper^-and^pencil adaptive tests are 
superior to conventional tests of the same length (McBride> 1976; Val(^» 
1975)* However > most research on adaptive testing has focused on brar hlng 
strategies which are sufficiently complex to req'iire comput(ir admlnlbtration^ 

A more sophisticated alternative to paper--and-pencil adaptive 
testing is to employ a co.:iputer to select and administer Individual 
Items* Testing In which the computer Is used to indlvldMally select 
items has been variously referred to as adaptive testioR (WeJss and Bet2> 
1973)> proj^ramnied testing (Cleary> Llnn> and Rock> I968a)> bran^.hlng 
tests (Bayroff and Seeley> 1967)> response-con t in gene testing f Wood > 
i973)> tailored testing (Lord> 1970)> and computerizeg adaptive testing 
(Kreitzberg» Stocking > and Swanson> 1978)* 

Computerized adaptive testing (CAT) has re'^ently beeii a subject of 
considerable research (Weiss and Betz> 1973; McBride> 1976)* Conferences 
on adaptive testing were held In Washington* D*C* in i975> and at the 
University of Minn(?sota In i977> and 1979* In addition to the active 
program being conducted at the University of Minnesota (Weiss» 1975)> the 
U* S* Civil Service Commission has been conducting research with a view 
toward automation of Civil Service testing (Urry> 1976)* Researchers at 
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Minnesota and the U* S* Civil Service Cotnmlsslon have developed compate*'^ 
systems capable of administering adaptive tests. 

The variables that have been studied by researchers include; 
reliability and validity (Weiss, 1973; Waters, 1974; Urry, 1976), accuracy 
at extremes (Lord, 1970), ability to reproduce conventional test scores 
(Llnn> Rock and Cleary, 1972), Information yl*ld (Lord, 1970)> effects of 
varying step sizes (cf* Wood, 1973; McBride, 1976), measurement error 
(Wood, 1973)> and number of Items presented (Wood> 1973; McBrlde, 1976)* 
Empirical research concerned with comparisons to conventional tests has 
focused on: external validity (Olivier, 1974; Waters* 1974), Internal 
consistency reliability (Vale and Weiss, 1975), test-retest temporal 
stability (Bet2 and Weiss, 1974; Larkln and Weiss, 1974), and character- 
istics of score distributions (Betz and Weiss, 1973), 

In section 1.5, major strategies for computer-administered adaptive 
testing will be considered* Prior to discussing computerized adaptive 
testing. It Is appropriate to review the psychometric foundations on 
which computerized adaptive tests are built* 

1, 3 Psychometric Foundations 

In classical test theory (Gulllksen, 1950), item parameters are 
defined in terms of group data* For example* the difficulty of an item 
Is defined as the proportion of individuals who get the item correct* 
This proportion Is not an Inherent property of the ltem> but will vary 
with the group; a given Item may be difficult for a group of low ability 
and easy for a group of high ability* 



^8" 

Although It is possible to develop adaptive tests based on classical 
test theory (e.g. Angof f and Huddleston* 1958) there are three Issues 
which are not easily resolved within this theory (Kreltzberg, Stocking, 
and Swanson* 1978); 

1* Scoring ' Since different examinees receive different items, the 
traditional numbtr-rlght (icore used by classical test theory Is 
Inappropriate, This raises <luestlons regarding the method of 
scoring the test and the comparison of scores received by 
different Individuals ♦ 
2* Item parameters > Since appropriate Items sre selected Individually 
for each examinee* Item characteristics must be population-Invariant* 
However* as previously noted* classical t*^st theory employs 
group-dependent Item parameters* 
3* Comparing strategies * There are many possible strategies for 
selecting items and scoring responses* Conventional tests are 
usually evaluated by such measures as reliability and validity* 
As these correlational Indices are group-dependent* they may not 
be appropriate for adaptive tests* since adaptive testing requires 
that methods of comparing different strategies and scoring 
procedures be independent of the group taking the test. 
Unlike classical test theory* item response theory (Lord and Novick, 
1968) allows the test scores of all examinees to be expressed within a 
common metric* regardless of the fact that each examinee may have answered 
different, and even different numbers of* Items* This metric allows 
ordering of examinees vlth respect to the trait to be measured and 
ciuantlf Icatlon ot the magnitude of the differences among examinees* The 



item parameters employed by item response theory are ind**pendent of the 
group to which the item is administered* In addition* techniques 
for comparing item selection strategies and scoring procedures have been 
developed which involve a consideration of the amount of "information" 
obtained from a test at various levels of the trait being measured (Lord 
and Novickj 1968)> and which are also independent of the group to which 
the test is administered* 

In item response theory^ it is assumed that some underlying trait is 
to be measured. As this trait is unobservable > it is often called a 
latent trait* It is assumed that the latent trait is unidlmensional* 
Generally^ ability traits are considered although achievement and person- 
ality traits may also fit the model* 

An iiiaividual is considered to possess a score 0 which indicates the 
level of trait he or she possesses* The true score of classical test 
theory is an isomorphic transform of 0 (Lord, 1980)* 

In item response theory* the probability of a correct response to an 
item is assumed to be a function of the individual*s ability level> 0> 
and the psychometric properties of the item. For every possible level of 0> 
there exists a probability of a correct response* The graph of this 
function is typically shaped like a normal ogive; the exact shape of the 
curve depends upon rhe psychometric properties of the specific item and 
on the model chosen. Figure K3a illustrates some typical item character- 
istic curves. 




Figure 1,3a Typical Item Characteristic Curves 
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Because the normal ogive Is mathematically Intractable, an alterna- 
tive model known as the logistic model is generally employed. With the 
logistic model* the probability of a correct response, given ©, is: 



(Lord and Kovick » 1968, p, 400), 

The logistic model is characterized by two parameters a and b. 
Individuals with very low values of 6 may sometimes get an item correct 
by chance. To account for this guessing factor^ a third parameter c^ is 
added to the model (see section 2,3), The three-parameter logistic model 
is the theoretical parent oi; the Broad-Range Tailored Test investigated 
in the current study. 

If © is the trait parameter and x is a general test score function 
of the response vector, then the information function ^j^(^) is defined 
as : 



The information function is a useful tool in evaluating the perforaiance 
of a test. 

Since the lassical notions of reliability are insufficient for 
latent t^^fiit theory, item-selection strategies and ability-estimation 
procedures are often compared through the use of information functions 




1^(6) - ae 
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(Lord and Novick, 1968)* While the concept of an information function is 
mathematically precise* its properties have great intuitive appeal* 

In particular, for appropriate models of P(^)> the maximum likelihood 
estimate ^ has an information function Inversely proportional to the 
length of the confidence interval for estimating the ability parameter e* 
The higher the information function, the more precise the estimate of 
ability* By comparing information functions of tests, it is possible to 
determine which test yields the greatest precision of measurement 
at different levels of ability* 

The information function for a conventional test using the maximum 
likelihood estimate of ^ is proportional to the number of items in the 
test (Lord and Novick, 1968)* This allows any comparison between informa* 
tion functions for a conventional test and a tailored test to be discussed 
in terms of the number of items that must be added to or deleted from the 
conventional test to obtain the same amount of information available from 
a tailored test> at various ability levels* 

An exiiellent review of the material In this section will be found in 
Sympson (1977)* 

1*4 Rationale for Adaptive Testing 

Paper-^and-pencil tests are generally designed to measure a reasonably 
wide range of abilities* Consequently, such tests include a range of 
item difficulties to permit them to discriminate among a diverse popula- 
tion of testees* Unfortunately, the need to restrict the test to a 
reasonable length results in fewer items at the extremes than in the 
middle of the range* This restriction neans that conventional ^lests are 
most precise at the center of their range of measurement, and precision of 
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measuremeat declines toward the extremes* 

Ideally, a test would be "tailored** to aa individual and would 
comprise test items that are clustered around the individual's 
ability level* The more closely a test approximates this ldeal» the 
greater will be Its precision of measurement. Computerized adaptive 
testing employs iterative techniques to select items which cluster around 
the individual testee's ability level* Mthough various adaptive strace- 
gies exist (see section U5), they generally follow a similar pattern: 

1) An initial estimate of the testee's ability level is made In 
some convenient way (e*g*» grade levels age)* 

2) The ability estimate Is used to select an appropriate item from 
the item pool, 

3) The item Is scored correct or incorrect* and an estimate of the 
testee's ability level is calculated. 

4) If the estimate is sufficiently precise* the procedure i'; 
terminated* Otherwise the estimate is further refined by 
returning to step 2. 

Kreitzberg* Stocking* and Swanson (1978) have recently reviewed the 
status of computerized adaptive testing and have enumerated its potential 
advantages In terms of its properties. 

Perhaps the major advantage of adaptive testing is that* in general, 
fewer items are required to achieve a specified level of measurement 
accuracy than are required in a conventional test* Numerous research 
studies (cf* Lord, 1970) have confirmed this* The increased efficiency 
of an adaptive test occurs because the most information Is obtained about 
an examinee If the items administered have a 50% probability of being 
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answered correctly — 65% of the time if guessing is taken into consldera" 
tlon (for a flve^cholce item) (McBrlde* 1976)* Items which are too easy 
cr too hard for a given individual contribute little information about 
the examinee (Sympson> 1977)* Since the purpose of adaptive testi ^ is 
to choose and administer those items which contribute nwst to the estimate 
of an individuals ability level> fewer items are required to achieve the 
same level of measurement precision* The information function of an 
adaptive test is higher at any point on the ability scale than that of a 
conventional unpeaked test» and higher at the extremes than a conventional 
peaked test. It is also less variable throughout the ability range 
(Lord* 1977b)* 

Improvements in measurement precision have been established theoreti* 
cally and verified in simulation studies. The anount of improvement to 
be expected with a given test depends on the size and characteristics of 
the item pool* As an example^ Urry (1976) suggests a roughly five*to-one 
(SOX) reduction in the number of items required to achieve reliabilities 
comparable to conventional test scores. 

As a co.isequence of Its higher and less variable information function 
throughout the ability range* an adaptive test is particularly superior 
to a conventional test at the extremes of ability. This situation Is 
depicted in Figure l*4a* The wider the range of ability being measured* 
the greater this discrepancy. Since underlying ability Is not usually 
directly measureable> this result cannot be verified empirically; however^ 
it has been demonstrated theoretically (Lord> l977a). It is a particularly 
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Important advantage with respect to testing lower-ability students* since 
In a conventional test the accuracy of such measurements Is virtually 
swamped by random error Introduced by guessing* 

Another consequence of the higher and less variable Information 
function of an adaptive test Is that scores better reflect the true 
distribution of ability In a population* Weiss (1975) has demonstrated 
this In simulation experiments* This Is important when group* as well as 
Individual* characteristics are of Interest* 

The latent trait theory underpinning adaptive testing contributes 
another Important advantage; scores based on latent trait theory are on 
an Interval scale* McBrlde (1976) notes that scores based on classical 
test theory are on an ordinal scale* Thus the magnitude of differences 
between scores has a natural meaning in latent trait theory* but not In 
classical test theory* 

There Is some evidence to suggest that scores on adaptive tests have 
greater temporal stability (test-retest reliability)* Weiss (1973) cites 
results of live testing experiments that Indicate this* and claims chat 
simulation studies show that It holds over the entire ability range* 

Finally* computerized adaptive testing may reduce some of the random 
error In conventional tests owed to confounding of power conditions* It 
has often been noted that* because of administrative requirements* gome 
element of speededness is frequently Introduced Into group-admlnlstered 
power tests* Weiss (1973) cites data showing that speededness differen- 
tially affects Individuals* thus confounding the predictive qualities of 
the test* This problem can be virtually eliminated with computerized 
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adaptive testing, since administration of the test Is Individualized, 
and time limits can be controlled by the examiner* 

Administrative Effects 

A great deal of attention has been given in both group and Individually 
administered testing to standardization of the testing environment, 
control of administrator effects, objectivity of scoring, and security of 
materials* Computerized adaptive testing offers potential advantages 
over conventional testing In all of these areas* 

Urry (1975) points out that computerized testing Is more standardized 
than conventional testing because the administrative procedures are 
programmed and, therefore, niore uniform and controlled* This reduces 
differential effects of the testing environment* In Individually admlnl" 
stered tests, studies have shown that administrator effects and clerical 
errors in scoring may seriously compromise test objectivity (Weiss, 
1973)* For example, factors such as expectancy, knowledge of the testee, 
degree of rapport » and race have all been shown to influence individual 
scores* Even in group administrations, the examiner may Induce different 
levels of stress in different individuals (Weiss, 1973)* Since computer- 
ized testing eliminates the human examiner and precisely controls 
administration and scoring, these effects should be better controlled, if 
not eliminated* 

Characteristics of the answer sheet and item arrangement have also 
been shown to affect group scores, as well as dlf f erentl;= ' ly affect 
individual scores (Weiss, 1973)* This compromises the psychometric 
qualities of the test* In couputerized testing these effects are elimina- 
ted* They are, however, replaced by a new set of factors relating to the 
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Interface between the individual and the testing device* Because of the 
relative nevness of the field, these factors have been largely unexplored* 
However, computerized testing provides a degree of control impossible In 
a conventional testing environment* This should make It practical to 
easily modify the testing environment as new evidence regarding the 
effects of that environment Is uncovered* 

Computerized testing should make It easier to safeguard the security 
of test materials* It has been argued (Wood^ 1973) that^ since test 
booklets are no longer needed, a^d since different Individuals receive 
different items^ the security problem will be diminished* This assumes 
that adequate procedures to safeguard the Integrity and accessibility of 
the computer have been developed* As computer security systems continue 
to Improve, this should be the case* 

Computerized testing provides significant advantages In the scheduling 
of test administrations* Tests can be administered at times and locations 
convenient to the student* For example^ walk-In test centers become 
possible* Even if test security requires that all administrations be 
simultaneous. It may be possible to locate terminals at the convenience of 
the student and virtually eliminate the administrative procedures involved 
In registration and arrangements for test centers* 

Fi ^lly, many of the othe^ administrative procedures required for 
conventional group-administered testing can be reduced or eliminated with 
computerized testing* The list Includes: test booklet and answer sheet 
printing, storage and distribution; accounting for and return of materials; 
answer sheet processing; certain aspects of score reporting; and test 
center management* These administrative procedures are» of course^ 
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replaced by others required for computer administration of tests* 
However > the latter should^ once established, be simpler and less costly 
to carry out on a routine basis* 

Affective Factors 

Little research has been conducted to date on the affective implica- 
tions of conputerized adaptive testing* Some researchers have hypothesized 
several advantages of computerized testing in terms of its effects on the 
testee* Chief among them is that it may increase the student's interest 
in and motivation for taking the test* Johnson and Mihal (1973) report 
results that shoved that blacks perform better on a computerized test, 
and suggest that motivational and examiner effects might have been 
responsible* Weiss fl975) found similar effects when feedback on the 
correctness of a response is provided* These results suggest that the 
computerized testing environment may in some cases be more motivating or 
less anxiety-producing than the conventional testing environment* 

It has also been hypothesised that, bei:ause items better match the 
ability level of the testee, adaptive testing may have a positive effect 
on the attitudes of high- and low-ability students* Veiss (1975) suggests 
that the high-ability student may be bored by a conventional test, with a 
resulting deterioration in performance * The low-ability student may be 
similarly affected by the frustration and anxiety resulting from attempting 
items that are overly difficult* In addition, there is evidence that low- 
ability students guess more frequently, thus introducing greater error 
into the score (Woiss, 1973)* Computerized adaptive testing should tend to 
reduce these negative factors* 
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Llmltatlons of the Grou^r^ialnlstered ^ Multiple-Choice Mode of Testing 

Computerized adaptive testing potentially offers several other 
advantages over conventional testing tnethods^ These advantages result 
from the power and flexibility inherent in computer administration of a 
test. One advantage is the ability to gather and report additional 
information about the testing process that cannot readily be gathered in 
conventional paper-and-pencil testings For example^ Weiss (1S75) developed 
a measure he refers to as a "consistency" index on each student- This 
cieasure is > roughly* the number of strata or difficulty levels administered 
to the student* Weiss showed that this measure is generally correlated 
to the test-retest reliability of the student's score. If this is 
so> then reporting this measure may provide additional information 
helpful in evaluating the student's score* 

Another example of additional information that can be gathered is 
response latency-*- the time it takes a student to answer an item* Green 
(1970) suggested that response latency may be related to guessing* 
Additional research will be needed to determine the value> if any> of 
latency Information* 

Computerized adaptive testing provides an opportunity for greater 
flexibility in the testing process Itself* For example* students can be 
given feedback on the correctness of their responses* Weiss (1975) 
showed that black students tend to score better and omit fewer items when 
feedback is provided* Students can also be permitted to re-try an item 
after an incorrect response* This may be useful in analyzing or weighing 
wi'ong answers* Finally* information about the testing session* including 
the student's score(s)> can be provided Immediately* This may 

On 
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be beneficial to the u^ers of test scores, as well as to the students 
themselves t 

Computerized testing more readily permits alternatives to the 
multiple-choice iten type than does conventional testing* These alter- 
natives might Include free or constructed response Items, and probabilis- 
tic response items (ones In which the student assigns weights or priori- 
ties to the choices). While such alternatives can be done with paper 
and pencil, they are difficult to administer and score. Computerized 
testing may* therefore, open up new approaches to Item design* 

1*5 Strategies of Adaptive Testing 

This section provides a brief suimnary of some of the major strate- 
gies which have been employed for selecting items In an adaptive test* 
Weiss (1974) and McBrlde (1976) have published extensive reviews of 
adaptive-testing strategies, 

Two-^sta^e 

One of the simplest adaptive-testing strategies, the two-stage 
strategy* has been prevlousl ' discussed. Typically* the two-stage 
strategy employs a routing test which provides an Initial estimate of 
the Individual's ability* Based on the score attained on the routing 
test* the Individual Is then branched to a measurement test appropriate 
to his or her ability level* A major advantage of the two-stage strategy 
Is that It can be used with paper-and -pencil testing provided that It Is 
administratively feasible to score the routing test before providing the 
examinee with the second-stage test* The major disadvantage of this 
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strategy is that olsroutlng due to error has serious measurement 
consequences* 

Pyrapldal 

Pyramidal models employ items which are structured in a tree as 
illustrated in Figure 1.5a. The testee begins at the top of the 
pyramid and is administered the Initial item which is then scored 
correct or Incorrect. Following an incorrect response, the testee is 
branched to an item of lower difficulty; following a correct response, 
the testee is branched to an item of higher difficulty. Lord's (1971) 
flexilevel item strategy, discussed previously, is a variant of a pyrami- 
dal branching model. Many other variations are possible and have been 
studied. Pyramidal models are somewhat sensitive to errors of measure- 
ment, as a correct guess or an incorrect response due to a confounding 
variable (for example, failing to understand a key word in a nonverbal 
item) may affect tht reliability of the final score. Weiss (1974) has 
pointed out that the pyramidal model does not guarantee that items will 
cluster at the 50 percent probability of a correct response, as desired 
for maximum information yield. 

Stradaptive 

The stradaptive strategy developed by Weiss (1973) employs an item 
pool which is divided into strata based on item difficulty. The initial 
item is selected on tlie basis of a prior estimate of the individual's 
ability, and branching occurs between strata. The stradaptive strategy 
differs from the pyramidal strategy in that branching is from stratum 
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Figure 1.5a A Pyramidal Item Structure (jig„r« adapted from Weis., 1974) 
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to stratum rather than fro© Item to ltem» although the same types of 
branching rules may be employed In both* 

y 

tayeslan 

A number of Investigators have explored Bayeslan models for item 
selection* The general procedure used In the Bayeslan model Is to obtain 
a prior estimate of the testee's ability along with an estimate of the 
uncertainty* The Item In the pool selected Is the one which will most 
reduce the uncertainty of the ability estimate. Following administra- 
tion of the item, the prior ability and uncertainty estimates are revised 
to yield a posterior ability estimate. This posterior estimate replaces 
the prior estimate for the next iteration. The procedure may be con- 
tinued until a posterior ability estimate has been obtained which has 
the desired degree of accuracy. Novick (1969) and Owen (1975) have 
proposed Bayesian procedures for adaptive testing. 

Maximum Likelihood 

The final class of adaptive testing strategies to be considered 
are the maxim*m-likelihood strategies. The maximum-likelihood procedure 
is similar to the Bayesian procedure, although the mathematics underlying 
it is quite different. The maximum-likelihood procedure requires that 
the testee answer least one item correctly and one incorrectly. When 
this has been accomplished, the maximuia-likelihood equations oan be 
solved to yield an ability estimate and standard error. This estimate is 
refined iteratively as in the Bayesian procedure. The Broad-Range 
Tailored Test of Verbal Ability, which was the subject of the current 
study, employed a maximum-likelihood strategy. The structure of this 
test is described in detail in Chapter II. 
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Chapter II 

The Broad-Range Tailored Test of Verbal Ability 
2.1 Introduction 

The Broad-Range Ta^ored Te«t of Verbal Ability (BRTT) was developed 
by Lord Cl977a). The test employs an Item pool stratified Into difficulty 
levels and arranged by Item type within difficulty. The computer initially 
selects specific items by an up-down rule; later Items are selected by a 
maxlmum-llkellhood algorithm. The BRTT yields an ability score appropriate 
to students from the fifth grade level to graduate school. McBrlde has 
characterized the BRTI as "the most aiubltlous adaptive testing proposal 
to appear In the literature, by virtue of the range of ability over which 
Lord Intended it to be used (McBrlde, 1976, p. 54). In designing the 
BRTT, Lord Investigated about thirty designs for a broad-range tailored 
test, administering each to approximately 1000 simulated examinees. The 
current study is the first empirical test of Its performance. 

The BRTT proved quite robust with regard to the selection of the 
Initial Item. As ohown In Figure 2.1a, Lord (x977a) found little difference 
In the standard error of measurement at 13 different ability levels 
related to the difficulty of the initial item. 

Lord's simulations Incluued designs whose item matrices contained 
more c-r fewer difficulty strata than the 10 columns (shown in Table 
2.1b) employed ip the present study. He found that a tes. wi-h the 
same difficulty range, but employing 363 items stratified into 20 groups, 
was at least -wlce as good as the 10-column, 182-ltem test of Table 2.1a. 
These results suggest that selection from a 363-ltein pool would support 
a much better 25-ltem test than selecting an equal number of items from 
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Figure 2,1a The standard error of measurement at 13 different 
ability levels for four different starting points for the 25^item 
broad-range tailored test. 



Reproduced by permission from Applied Psychological Measurement, 
edited by David Weiss* A Broad-Range Tailored Text of Verbal 
Ability by Lord, copyright © 1977, Volume 1> Number 1» West 
Publishing Company, All rights reserved. 
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Table 2.1b 



Broad-Ranse Verbal Test Items Arranged by Difficulty I/!vel and Serial Number* 
(a^b,c,d,e represent different verbal Iteoi types.) 



Item 
Serial 
No, 

1 
2 

5 
U 
5 
6 
7 

e 

9 
10 
11 
12 
15 

1*4 

15 
16 
17 
1 

19 

20 

21 
22 

25 
24 

25 



(easy) 



Grade Level; TV 



VI VII VIII 



Item Difficulty Level- 



XII 



(hard) 



d 
d 

e 
e 



d 
d 

e 



e 
d 
e 
d 
a 
e 
d 
e 
d 



a 

e 
d 
e 
d 
a 
e 
d 
e 
d 
a 
e 
d 
e 
d 
a 
e 
d 
e 
d 

c 
d 
c 
d 



a 
e 
d 
e 
d 
a 
e 
d 
e 
d 
a 
e 
d 
e 
d 
a 
c 
d 
c 
d 
a 
c 
d 
c 
d 



a 
e 
d 
e 
d 
a 
e 
d 
e 
d 
a 
c 
d 

A 

d 
a 
c 
d 
c 
d 
a 
c 
d 
c 
d 



a 
e 
d 

e 
d 
a 
e 
d 
c 
d 
a 
c 
d 
c 
d 
b 
c 
d 
c 
d 
b 
c 
d 
c 
d 



a 

c 
d 
c 
d 
b 
c 
d 
c 
d 
b 
c 
A 
c 
d 
b 
c 
d 
c 
d 
b 
c 
d 
c 
d 



d 
c 
d 
b 



b 
c 
d 
c 
d 
b 
c 
d 
c 
d 
b 
c 
d 
c 
d 



d 

b 
c 

c 
d 
b 
c 
d 
c 
d 



d 
b 
c 
d 
c 
d 



(Table from Lord, 1977) 

Reproduced by permission from Applied Psychological Measurement, 
edited by David J. Weiss» A Broad-Range Tailored Text of Verbal 
Ability by Lord, copyright © 1977, Volume 1, l^umber 1, 
West Publishing Company. All rights reserved. 
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a smaller* 182-ltem pool* Still better tests could be produced by 
using still larger Item pools* selecting the best 25 Items for each 
examinee* The Item pool size used In the current study was based upon 
the number of Items available from Lord's (1977a) simulations; It was 
chosen for practical reasons rather than theoretical optlmallty* 

2*2 Comparison of the BRTT to the PSAT 

Lord compared the Information yield of the BRTT with the Prelimi- 
nary Scholastic Aptitude Test of the College Entrance Examination Board* 
Figure 2* 2a shows the Information function for the verbal score on each 
of three forms of the PSAT adjusted to a test length of 25 Items* 
compared to the Information function for the Verbal score on the Broad- 
Range Tailored Test* In the tailored test the Initial item administered 
was at a difficulty level appropriate for average college students* The 
PSAT Information functions were computed from estimated Item parameters; 
the tailored test Information function was estimated from responses of simula- 
ted examinees « 

McBrlde (1976) argued that comparing the BRTT to the PSAT adjusted 
to a 25'ltem length may have been unfair since the BRTT selects the 
25 "best" Items* whereas the PSAT Items have divergent discriminating 
power* He suggested that a preferable comparison of BRTT to PSAT 
would compare the 25 "best'* Items of the PSAT, where best Is defined as 
either the 25 Items with most discriminating power or the 25 Items with 
the most information at a given ability level* Both McBrlde and Lord 
agreed that the results of the simulations were promising and suggested 
that the procedure be attempted with actual examinees* The current 
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Figure 2.2a Information function for the 25-item tailored 
test> also for three forms of the Preliminary 
Scholastic Aptitude Test (lower lines) adjusted 
to a test length of 25 items. 



Reproduced by permission from Applied Psychological Measurement* 
edlC: ' by David J* Weiss* A Broad-Range Tailored Text of Verbal 
Ability by Lord, copyright © 1977> Voluiiie 1> Number 1, West 
Publishing Company. All rights reserved. 
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study solicited PSAT scores from the students participating for pur- 
poses of the comparison* 

2*3 The Item Pool 

The Items making up the two forms of the Broad-Range Tailored 
Test were selected from five ETS-admlnlstered testing programs: the 
Graduate Record Examinations (GRE)> the Scholastic Aptitude Test In 
both standard (SAT) and preliminary (PSAT) forms, the School and College 
Aptitude Test (SCAT), and the Sequential Tests of Educational Progress 
(STEP)* 

An Initial Item pool consisting of 898 Items was obtained by 
selecting all verbal Items that were one of the following Item types: 
(1) synonyms, (2) opposites, (3) incomplete sentences, (4) word relations, 
(5) sentence comprehension. A detailed description of the Item pool Is 
provided In Appendix B. 

Estimates of the three Item parameters were obtained by the LOGIST 
program^ The Itemf^ were placed on a common scale by obtaining previously 
computed equating^ which related nuraber-rlght scores among all tests. 

*rhe equating of test s^ to test £ was accomplished by employ- 
ing LOGIST to compute; 6 - ability estimates for each person* 

a , b , c - l^em parameters for each Item* 
s s s 

Using the 3 paramete^^ logistic model the probability of a correct answer 
to Item 1 given ^ Is expressed by the equation; 

P^C^g) - c^ + exp(-Da^(e^-b^)))* 
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An estimate of the true number-right score, ^ , is given by 



^s v; ^fv- 



Similarly, the analogous quantities 

e 

P 

a ,b ,c 
P P P ^ ^ 

P^CBp), and Cp =JP P^CBp). 

were computed for teat The transformation of 9 to 9 waa then 

s p 

computed uaing knowledge of the number-right relationahip between s 
and £ to place the item parameters on a common scale. 

2'^ Structure of the Item Pools 

The item pools were stored on disk and were indexed by means of 
an item-type table. Thia table waa atructured aa a rectangular array 
with k r>)ws and m columns. Each column l,...,m represented a range or 
stratum of ability (e). Rows l,...,k indicated an item sequence; the 
type of the 1 item administered was apecified by row i^. Table 2.1b, 
shown previoualy, is the item type table form Form A. 

The two item-type tables employed in the current study contained 
25 rows and 10 columna each and were developed by randomly splitting 
a 20-column table. The pool for Form A contained 163 items; Form B 
contained 180 items (Swanson and Stocking, 1977). 
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The Item-type table determined the type of Item to be administered 
at each step in the assessment procedure « It was needed because there 
was an insufficient number of items in the pool to ensure that a 
desired item type would be available at every point in the procedure* 
Ideally* there would have been only one Item type in each row of the 
table* In this case* all examinees would take the same sequence of 
item types* As can be seen in Table 2*lb» only an approscimation to the 
ideal case was possible* Controlling the sequence of item types was 
intended to enhance the comparability of the latent trait (unidimen- 
sionality) across examinees* 

The item^selection algorithm employed in the study had three 
phases: (1) selection of initial item* (2) up-down procedure* (3) 
maximum-likelihood procedure* 

As Lord's Cl977a) simulations had suggested the standard error of 
the BRTT would be relatively insensitive to the choice of the initial 
item* the same initial item was administered to each student* The item 
was selected from the first row and middle column of the table* For Form 
A* b^ » 1*38; for Forra B, b^ * 1*47* 

The maximum-likelihood estimation requires that the response 
vector contain at least one correct and one incorrect response* Following 
the initial item* a simple up-down rule was employed* raising or lowering 
the difficulty level of successive items until the complementary response 
was obtained* The current study employed a step size of + one column* 



1^ 
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Once the up-^down procedure resulted In at least one item correct 
and one Incorrect* the maxlmunj-llkellhood estltoatlon procedure was used* 
The MLE algorithm operated as follows; 

1* Determine the type of Item to be administered by 
consulting the next row of the Item-type table 
and selecting the colunm In which 

b^ < e and» 

I b^ - & I vas smallest 

2* Select the most discriminating Item of the 
appropriate type and difficulty remaining In 
the pool and administer It* 

2,5 Implementation of the Broad-Range Tailored Test 

The Broad-Range Tailored Test was Implemented on a PDP-U/CO 
computer system* A technical description of the system Is provided In 
Swanson and Stocking (1977)* This section presents an overview of the 
system's structure* 

The following goals were established for the system design: 
I* The system was to be designed In a flexible 
modular fashion to permit alteration of item 
pools» selection strategies^ stopping rules^ 
human-computer Interaction protocols^ and data 
collection strategies with minimal effort* The 
purpose of this objective was to facilitate the 
system's use In a variety of environments* 



2* The system was to be coded In ANSI FORTRAN with 
minimal dependency on characteristics of the 
PDP-ll/AO computer* The purpose of this objective 
W88 to fscllltace transportability of the software 
to other computers* 

3» The system was to be as Independent of the Sroad- 
Range Tailored Test as possible* The design of 
the system should permit most parameters to be 
specified at run time* 

4* The human-computer Interaction protocols were to 
be simple and nature! » The student should not 
perceive computerized administration as a barrier 
to overcome* 

Figure 2»5a shows the file structure of the CAT system* The system 
employs five files* The first file Is the Item pool * Each Item stored 
In the pool Is assigned b. number* The system builds sn Index to the 
pool* The Index contains Information about each Item Including; the 
key* the Item parameters (a» b> c)» and the Item type* Also contained 
on the Item pool file are the Inatructlons which explain how the Item 
Is to be answere<4* The system provides for two levela of Instructlona 
for e^ch item type* The first level la a long form* the second level 
la B terae form* If apeclfled* the system will present the long form 
Instructions when the teatee flrat encountera the Item type and will 
preaent the second form on aubsequent presentatlona of the same type* 
This reduces the testee'a reading load» 
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The second file Is called Lhe test speci£lcAtlon file . This file 
contains Information which directs the system as to how the test Is to be 
presented* Among the data In this file are; the Item pool to be used, 
the Item selection strategy to be employed, the scoring method to be 
employed^ number of items to be administered^ and feedback and re-try 
specifications* The file contains multiple test specifications* the 
choice of which Is used Is made when the test Is actually administered* 

The thlt. is called the message file * It contains messages to 

be displayed on the terminal If errors arlse^ 

The fourth file Is called the Instructional file * It contains 
Instructional frames that teach the student how to use the system* 
This feature was not used In the present study. 

The fifth file Is called the log j lie* The system writes a record 
to this file following each Item, The file contains such Information 
as : the testee Ident If Icatlon , the item administered , the response , 
the response latency^ and the current ability estimate* This file was 
used for data collection purposes In the current study* The system 
can be Instructed to write log files at several levels of detail- 

The Items are displayed on a terminal connected to the computer 
via telephone lines. The terminal employed was a DEC VT52 cathode-ray 
tube display terminal* This terminal has a screen which displays 24 
lines of SO characters. It has a full typewriter keyboard and a small 
keypad containing 19 keys* 




File 



PDP - 11/AO 
Computer 



Modem ~| 



Terminal 




Modem 




Terminal 



Figure 2.5a File Structure of the CAT System 
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To slmpHiy the human-computer Interaction, special keycaps were 
ordered for the small keypad. These keycaps are shown in Figure 2,5b, 
The student Indicated his or her response by pressing the appropriate 
key. An asterisk appeared next to the corresponding option on the 
screen. The student could alter the response by pressing a different key 
or ^;o on to the next item by pressing the key marked "enter,** The 
"retrans" key instructed the computer to retransmit the item in case 
noise in the telephone system produced a garbled display* 
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Figure 2.5b Response Keypad U^ed in the 
Study 



Chapter III 

Empirical Performance of the Broaa Range Tailored Test 

3.1 Sub.jec and Method 

In his article reporting on results of the simulation studies of 
the Broad Range Tailored Test (BRTT) , Lord (1977) recomniL^nded the admin- 
istration f the test to a live population — a suggestion <2choed by 
McBride (1976). This study was designed to explore the empirical perfor- 
mance of the BRTT. It involved two major tasks: (1) the development of 
a computer system capable of administering two forms of the BRTT and (2) 
the adminisLration of the BRTT to a population of students. This .study, 
which describes the experira^t» Is supplemented by the description rf the 
computer systen. reported by Swanson and Stocking (1977). 

The range of individuals for whom the BRTT is appropriate is quite 
wide; it yields a score from the fifth grade to graduate level. However* 
a more homogeneous population was selected for this study* so that we 
(^^uld rompare the performance of the BRTT with a conventional test of more 
liniitod range. Sin'^a prior simulations of the BRTT involved comparisons 
^.'ith the PSAT^ the present study employed a comparable high school population. 
The In.tL ility of the PDP-11 computer system to support more than two "dial-up" 
terminals at a time significantly limited the number of Individuals to whom 
rhc test could be administered. Therefore, an un*;elected population of 
\\i)^h school *itudents was us<?d, drawing as many as possible from the el<?venth 
grade. Each student received two forms of the BRTT; PSAT scores we^^'e obtained 
for those students who had taV.en the test as part of their academic work. 
Our tar^^eL sample si^e was 150 students. 
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Frequency distributions of all high schools in Monmouthj Middlesexj 
Mercer, Imncerdon, and Scmerset counties (WJ) wer-^ ob!:ained fur the number 
of students in the eleventh grade, the number of students who elected to 
t<ike the PSAT» and thti distribution of PSAT scores within school for the 
ye.ir Thi*i survey was conducted to select institutions with a 

reasonably v;lde and representative range of student abilities. Two ot the 
i^ix schools initially contacted agreed to participate In the experiment; 
Princeton High School and South Brunswick High School, Princeton High 
School reque^ited that students not be paid Tor participation in the study, 
while South Brunswick requested that students be paid a fee of $3,50, 

Sample_ C h aracteristics 

0[ie hundred forty-six students from the two schools participated in 
the study (Princeton N=80; South Brunswick N=66) • Seventy-one of the 
students were males and seventy-five were females. 

E xperimental Design and Procedure 

Bach student received two forms of the BRTTj adminisK red in a single 
clas!^ period. The order of the forms was counterbalanced within sex, as 
shown below: 

Form 

Hale A B 

Female A B 

Male B A 

Female B A 

Assignmpnt of student? to order was performed randt^mly within sex. 
Upon entering the testing room, the stMdent was ''s:j;;nod on'* the 
Computer bv a proctor who explained tiie use of the terminal. The studiMit 
Q then responded to the 25 items selected by tl;e computer. Following 

ERJC 5G' 



completion of the first form of the BRTT» the proctor initiated the 
alternate form and the stud^n: responded to an additional 25 items. 
The student was then askeU to complete the posttest questionnaire. 
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3 , 2 0 vojyj. e*^_ o^f t ^ ^Jl^JX^S^. 

Conceptually 1 tho data analyses presented in this cliapter >nay be 
divided Into four units. The first unit is descriptive and presents 
t har^icteristicri of the observed data. In particular, Section 3. 3 
describes the scores and score transformations derived from the dati; 
Section 3. A presents frequency distributions of the scores; and Sec t ion 3. 5 
analyzes the distributional characteristics of the scores to determine 
whether they meet the normality assumptions for correlational statistics. 

The second unit, which presents information functior^s fur both forms 
of the BRTT, is directly related «:o simulation data reported by Lord 
(?977a). The information functions are pres'?iited in S^ectJ^on^ 3_^, 

The third unit of data analyses involves the reliability and 
validity of the BRTT. Sect i pn_ _3._7 shows the comparison of the relia- 
bility coefficients for the scores and t ransf omat ions . The correlations 
between the BRTT scores and the PSAT verbal scores are presented as a 
measure oi concurrent validity. A series of tests on th^B mean scores 
from T'orms A and fi ^ro presented, which bear on the parallelism of tht 
two forms. The reliabilities presented in ^ ect ion 3 .7 were computed at 
the final (25th) item. The reliabilities of the BRTT with the PSAT at 
test len^>th^; of 1, 2,,<<., 25 itoms are compared in $_ect,ion 3.8, and are 
presented in the form of plots. Discussion on the likelihood function and 
the m,ixlmiim likelihood estimator is presented in $_ e<Li: /in 3.9 . 

The fourth unit of data analyses Involves the performance of the 
maximtjttn-i ikelthood estimator (MLC). Section 3,10 present? analyses of 
rfu^ Ml.K and .Ie.;'*>ns t ra t es that, overall, the procedure performed asi 
e-JCpected. S ection 3.11 is a Monte Carlo analysis of the MiE prQcedurej 
In which the examinee's observed responses compared with re^^nonses 

*^ r > 
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obtained by simulation based on Lhe estimated 0. Section 3.12 presents 
the examinee response patterns which resulted in anomalous ability 
est ijitates. 

The findings presented in this chapter are summarized in Section 

3.13. 
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3 . 3 Observed Scores and Sc o re Transformation s 

The characteristics of nine score variables derived from the data on 
the computer log files were examined. The scores are defined a<^ follows: 

1. 0. Theta is the parameter ccmmonlv used to denote the latent 

trait. Each examinee is assumed to possess a fixed v^lue 
for 0, which is estimated from his or her responses to test 

i terns . 

2. Omega is a monotone transformation of 0 

which was proposed by Lord (1975). Lord orooosed this 
transformation because of an observed correlation among 
item parameters de " Ined do the 0 scale. The trans format ion 
is: 

<^ (e) = R(a|b>ab 



Where R(alb) is the observed re^reiision of a^, the discrimin- 
ation onto b, the difficulty. The transformation eliminates 
correlation between the a and b parameters. The regression 
was taken to be linear in the variable jb and the res^ -ing 
transformation Q(^) is an eighth degree polynomial in 
with coefficients (starting with the const-int, ending with 
the eighth power) ' 

X Coefficient of 0 ^ 



0 
I 

3 

6 
7 
8 



O.01Q1636451O799269O00 
0.527818498^9492630000 
0.02985677403820969000 
-0.00140773413450821900 
-0.00039316355724225770 
-0. 000033022 3421 7826338 
-0.00000142120205848321 
-(1.00000003151838404393 
-0 . 00000000028208952 766 
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3, ^^uInber righc. The common score employed in paper^and-pencil tests. 

Other scores related to number right that are occasionally 

referenced are number wrong and number omitted, 
A. p. Expected proportion right is the true score transformation 

for the test (Lord and Novick, 1969, p, 387), and is 

compute/* over the entire item pool by the formula: 

wht.re N is equal to thp number of items in the pool, and P^C*^) 

is the probability of coirect response to the ith item at ability 0, 

5* *^rotal " Tuean difficulty over all items administeredi 

6* ^correct " mean difficulty over all items answered correctly. 

7, ^high " highest difficulty of all items ansvered correctly, 

8- ^final " difficulty of the final item administered, 

9, S,T - are a weighted number correct scc^e and an expected weighted 
number correct score. They are given by the formulas! 

and T(0) = 25 ^i^''^ "i^®> 
where w^(^) = ^ 

and Uj^ = 1 if the individual responded correctly to 

the item, 0 otherwise. 
These scores serve ;is a final check on ihc- adequacy of 
the algorithm for estimating 
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3,4 Frequency Distribution of Screes 

Frequency distributions for , number correct, number wrong, 
number omitted^ average difficulty of items answered correctly^ and 
expected proportion correct over the entire item pool are presented 
in this section. 

Eleven cases have been deleted from the present and subsequent 
analyses. In Form A, six individuals obtained ^ scores of ^6.00, 
These individuals were excluded from the analyses because they 
represented anomalous cases whose scores were not directly comparable 
to the remainder of the subjects* Five additional cases were 
excluded after finer analysis revealed several anomalies either in 
the individuals* responses or in the computation of 0 by the computer 
system, (See Section 3,12 for a discussion of the anomalous 
protocols) • 

Table 3,4a summarizes the frequency distribution for G computed 
at the final item for Forms A and B of thp liKTT, In both cases, the 
majority of the examinees scored in the range -0,5 to 2.5, Table 3*Ab 
sumjnari::es the frequency distribution of n for Forms A and Table 3,4c 
sijmmarizes the number of items answered correctly for Forms A and 
For i>oth and the Form A and Form B distributions appear roufthlv 
comparable, 

rahl<? 3,4d sMRimar i the distribution of t hv number i>f it**ms 
,inswered incorrectly for both forms of th** BRTT; Table 3,4e snrnm^jrizes * h^- 
distribution of the mber of items omitted for tht' two fnriftS, Note that 
lor both forms, ,it least 30^^ f>f all examinee<i o-riitted ont^ or mi^re items* 
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Additional techniques for scoring omitted items and allowing examinees 
to review should be important areas of research in the future^ si^ce 
some examlnef^s have a high tendency to omit items. 

Table 3.4f sunanarizes the distribution of the average difficulty 
of items answered correctly; Table 3.4g summarizes the distribution of 
the expected proportion of items answered correctly if the examinee 
answered all the items in the pool. The Form B distribution of each 
score is slightly more dispersed than the Form A distribution of the 
corresponding score. Section 3.7 and 3.8 provide infontation which 
bears on the comparability of the two forms. Table 3.7c presents the 
frequency distributions of the PSAT scores. 



TABU 3.4a 
Frequency Distribution of 6 



Form A 



Form B 



Interval 


Freq . 


t 

% 


Freq . 


X 


4.0 - 3.3 


1 


0.7 


1 


0.7 


3.3 - 2.6 


4 


3.0 


I 


0.7 


2.6 - 1.9 


20 


14.8 


26 


19.3 


1.9 - 1.2 


42 


31.1 


30 


22.2 


1.2 - 0.5 


30 


22.2 


38 


28.1 


0.5 - -0.2 


26 


19.3 


25 


18.5 


-0.2 - -0.9 


9 


6.7 


11 


8.1 


-0.9 - -1.6 


2 


1.5 


2 


1.5 


-1.6 - -2.3 


0 


0.0 




0.7 


-2.3 - -3.0 


1 


0.7 


0 


0.0 



N 

Mean 
SD 

Minimum Value 
Maximum Value 



135 
1.0702 
0.9598 
-2.6U8 
3.7837 



135 
0.9829 
0.9550 
-1.7689 

3.3755 



+ Total of percentages throughout Is not 100 due to rounding. 
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TABLE 3.4b 
Frequency Distribution of U 



t Form A Form B 



Interval 




Freq . 




% 


Freq. 


% 


2.5- 2.1 




1 




0.7 


0 


0.0 


2.1 - 1.7 




2 




1.5 


2 


1.5 


1.7 - 1.3 




10 




7.4 


13 


9.6 


1.3- 0.9 




25 




18.5 


26 


19.3 


0.9 - 0.5 




50 




37.0 


32 


23.7 


0.5 - 0.1 




26 




19.3 


32 


23.7 


O.i ~ -0.3 




16 




11.9 


24 


17.8 


-0.3 - -0.7 








3.0 


;;4 


3.0 


-0.7 - -1.1 




0 




0.0 


2 


] .5 


-1.1 - -1.5 




1 




0.7 


0 


0.0 


N 






135 




135 




Mean 




0.6359 




0.5852 








0. 


5417 




0.5442 




Minimum 


Value 


-1. 


1453 




-0.8165 




Maximum 


Value 


2. 


2567 




2.0191 





-SO- 
TABLE 3.4 c 

Frequency Distribution of Mumber of Items Correct 
for Fon:5 A and B 



Mumber Form A Form B 

Correct Freq . X Freq . 



22 1 
21 1 
20 1 

19 10 
18 8 

17 20 

16 17 

15 18 

14 27 

13 13 

12 13 
U 3 
10 1 

9 2 
N 135 

Mean 15.1333 
Sp 2.4089 



0.7 0 0.0 

0.7 3 2.2 

0.7 2 1.5 

7.4 3 2.2 
5.9 15 11.1 

14.8 19 14.1 

12.6 18 13.3 

13.3 22 16.3 

20.0 23 17.0 

9.6 14 10.4 

9.6 8 5.9 

2.2 6 4.4 

0.7 1 0.7 

1.5 1 0.7 

135 

15 2296 
2.3434 
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TABLE 3.4d 

Frequency Distribution of Number of Items Incorrect 
for Forms A and B (N = 135) 



Number 
Incorrect 



Form A 
Freq. X 



Form B 
Freq. Z 



16 
15 
U 
13 
12 
11 
10 

9 

8 

7 

6 

3 

A 

3 

2 

_N 

Mean 
SD 
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1 0.7 

1 0.7 

1 0.7 

6 A. A 

9 6.7 

12 8.9 
22 16.3 
15 11. 1 
18 13.3 
18 13.3 

13 9.6 
11 8.1 

A 3.0 

3 2.2 

1 0.7 

135 
8.5037 
2.6818 



0 

0 

3 

2 

8 
12 
22 
21 
25 
22 

8 

7 

2 

3 

0 

135 
8.5852 
2.25^0 



0.0 
0.0 
2.2 
1.5 
5.9 
8.9 
16.3 
15.6 
18.5 
16.3 
5.9 
5.2 
1.5 
2.2 
0.0 



TABLE 3.4e 



Frequency Distribution of Number of Itens Omitted 
for Forms A and B (1^135) 



Number 
Omitted 



7 
6 
5 
k 
3 
2 
1 
0 



Form A 
Freq. % 



Form B 



2 
2 
5 
6 
16 
18 
25 
61 



l.^j 
1.5 
3.7 
k.k 
11.9 
13.3 
18.5 
45.2 



Freq . 



2 
1 
4 
5 
13 
16 
29 
65 



1.5 
0.7 
3.0 
3.7 
9.6 
11.9 
21.5 
48.1 



N 

Mean 
SD 



135 
1.3630 
1.6867 



135 
1.1852 
1.5797 



• 53- 



TABLE 3-4f 



Frequency Distribution of Av -age Difficulty for 
All Items Answered Correctly 



Interval 



Form A 
Freq. % 



Form B 
Freq. % 



1.5 ~ 
1.: - 
0.5 - 
-0,5 - 
-1.5 - 

-2,5 - 
-3.5 - 



3.: 

2.5 
1.5 
0.5 
-0.5 
-1.5 

-:.5 



1 

28 
66 
31 
7 
1 
1 



0.7 
20.7 
48.9 
23.0 
5.2 
0.7 
0.7 



0 
32 
48 
43 
10 
2 
0 



0.0 
23.7 
35.6 
31.9 
7.4 
1,5 
0.0 



Mean 
3D 

Kintmum Value 
Maximum Value 



1^5 
0.8055 
0.8536 
-3.03*' 
2.6471 



135 
0.7213 
0.8732 
-1.8044 
2.4312 
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TABLE 3.4g 

Frequency Distribution of Expected Proportion 
Correct for Entire Item Pool 



Interval 



Form A 



Freq, 



Form B 



Freq . 



1.0 - 0.9 

0.9 - 0.8 

0.8 - 0.7 

0.7 ■ 0.6 

0.6 - 0.5 

0.5 - 0.4 

0.4 - 0.3 

0.3 - 0.2 

0.2 - 0.1 

0.1 - 0.0 



1 

14 
40 
40 
27 

10 
2 
1 
0 
0 



0.7 
10.4 
30.0 
30.0 
20.0 
7.4 
1.4 
0.7 
0.0 
0.0 



2 
23 
29 
43 
28 
8 
2 
0 
0 
0 



1.5 
17.0 
21.5 
31.9 
30."/ 
5.9 
1.5 
0.0 
0.0 
0.0 



Mean 
SD 

^ .„ . Value 
Maximum Value 



135 
0.66244 
0- 118111 
0.28988 
0.93656 



135 
0.66712 
0.U831 
0.35343 
0.92536 



\ 

3 . 3 Tests f or Normality of Score Distributions 

Tnft^rences that use the usual correlational statistics to evaluate 
the reliability of the BRTT> are based on the assumption that score dis-- 
trlbutions are approximately nomal. Data which devlat*^ from tnis 
assumption may produce inflated correlat ioiis . The distributional 
ch<irac ter ist ics of the data were investigated by plotting rhe percentile 
from the standard normal distribution (2) against the same pe**cc*ntile from 
the various empirically observed score distributions. Whenever the 
empirical values follow the normal distributions exactly* the points fall 
precisely on a straight line? and deviations from the straight line 
represent deviations from normalit^r. Points which fall above the line 
represent observed values which excieed the expected values > whereas 
points which fall helow the line represent, values below expectation. 
Observations which tend to inflale correlations are those with percentile 
rank greater than 50 (less than 50) and with percentiles falling above 
(beU'V^O the I ine . 

Figures 3.5a and 3.5b show the distribution of u fur fomm A and B 
rospert ive I y , Both plotfi show deviations from normality; under normality 
the Form A plot shows three negative values .^^maller than would be expei "ed^ 
unci the Form B plot shows that the positive tail of the distribution is 
shorter than ejcpecced. These observat? s might suggest that a suitable 
transformation be made on the 0 scores to achieve normality. However^ since 
the resulting transformation would depend on this specific data set^ It might 
not be s:.iitalle for other data sets. We determined not Lq transform the data 
because a reliability analysis based on the tranf^'ormed dala would not be ^f 
j^tjieral use* Research to develop measures of reliability which are not overly 
dependent on the distribution of ability in the populatioit Is re_commer4,ded. 
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The Pearson correlation coefficient, which is based on the scores 
described, is used as a measure of reliability in the remaining analyses, 
The sample Pearson correlation coefficient can be used to estimate the 
population Pearson coefficient. Since the design for this study is a 
close approximation to simple random sampling of subjects, the san^ple 
correlation coefficient is an unbiased estimate of the population 
parameter. 

Figures 3.5c through 3.5n show the distributional characteristics 



of the score 



' ^f^n=.i' Ki^u^ and _e; all the scores 



' cotal' correct' final ' high^ 
show deviations from normality. The score showing the least departure 

from normality is b .To further investigate the distribution of 

correct ^ 

'^correct' ^ bivariate plot of Form A versus Form B is included in Figure 
3.5o. This plot exhibits several peculiarities which could lead to 
rejection of normality upon a finer analysts than was conducted here. 
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Figure 3.3b Distributional Characteristics of Theta (Form B) 
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Figure 3*51 Distributional Characteristics of b^, , (Form B) 
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Fi^uro 3.Sk Distributional Characteristics of , , (Form A) 
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Figure 3.5L Disfibutiopal Characteristics of (Form B) 
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3.fv J i^forniat Ion F*uric*:ior^ 



''^^le inforr-aticn a'.;sociaced with the iaaximum likelihood estimate of 
the ability parameter 0 is givf?ii by Lord and J?ovick (i967, p. 460) as; 



where P^(0) is the probability of a correct respoiise to Iten i, 
Q^(e) = 1 " P^(9), aiid K(e) is the derivative ot P.(0) 
v'ith respect to 6. 
For the BRTT» the value oi \i is 25 and the items may be different 
for different examineei;. 

The values of the information foi each estimated ability le/el wer-s 
co[r.puted based upon the actual itt^ms administered. The values vere tSen 
transformed to obtain the information at each estimated '/-score of -j^iJity* 
The information in \i Is 

where l(^) is given above^ 0 Is given implicitly as ^^li^) ^ ^ :inJ (^'^r } is 
the derivative of the tran.5fonn with respect to 0. 

Figure- 3.63 and 3.6b display the soatt er-plots of KU) v:> for 
Forms A and B respectively. Each scatter-plot has been s>m*yGthfSf^ i.'vM^^r c\hi(. 
spline Interpolation ^vaUabl^ In the SPBAKEASV computing: frN^.j/,. . 
These resii.Us art^ displayed by the dashed line. In additi^.\-., rtio ■ id 
line in Figure 3.6a represents the simulated reciprocals of ^u^ varjance 
of the maximum likelihf^od estijnatoi ojC given in Lord (lS7Va) 
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Flgure 3.6a presents one of the most interesting results of this 
study: the simulated curve is a very close approximation to the actual 
outcome of a live ercper iment . The theory provides a useful too' in 
the i^vjluatturt of menta] tests^ but one should be cautioned that the 
iiimulated curve was obtained by simulating responses to items according 
to the item response model. Further^ the empirical information function 
is calculated according to the theoretical model for responses bas^d on 
those items chosen by the BRTT in the experiment* 
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Figure 3* 6a Observed Information (dots) » Smoothed Information (dashes). 
Simulated information (solid line)^ (Form A) - 
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J - 7 ParalleHsn, Reliability, and Validity at the 25th Item 

This section presents data relevant to three important characteristics 
ot' the BRTT; the reliability of the ^ ores, the validity of the construct 
measured, and the parallelism between each score formulated for Forms 

A and B of the test . 

Table 3.7a presents reliabilities for seven scores obtained from the 
liKFT. The reliabilities -^re measured by the Pearson product moment correlation 
coefficient between the scores obtained on Form A and those obtained on Form 
B for each student. An adjusted reliability for the PSAT verbal score was 
computed by obtaining the test's reliability from the statistical analysis 

report (Form 3APT1) published by Educational Testing Service as a standard 

oos tadminist rat ion procedure. The reliability was adjusted for the 

obtained sample by Gulliksen's formula (1950^ p. 11^): 



where; 



is the reliability of the test for the poouLition 



is the variance of the test for the o.^puiatinn 



is the Vf'iriance for the sample* 



i Tio T>iih 1 : r; Uahilitv was r = *89 with r^'" = lK8il, Civen sample 

2 

vnri/ince ^ - ]r].^2, the .uliustment yielded a reliabiUtv l" ~ ,9111, 

X - XX 



To faoHitatt> ( omu.iris.^n of the 6S item P'^At^ ^"Uh the 2^ item BRTT, 
t Ur Sp<'.irmnn^^rfiwn formula 



Mm 
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XX 



I + (K-1) r 

XX 



was applied with K=65/25. The expected reliability of the PSAT reduced 
to 25 items was 0.65. The reliability is directly comparable tc the 
correlations shown for the BRTT scores. 

Inspection of the column headed r^^^ xt\ Table 3,7a reveals that all of 

the BRTT sco**es were more reliable than the PJ5AT score at the 25th item. 

The highest reliability was found ^-or 0^ H and £, The scores which were 

computed from the mean difficulty of all items administered (^^.Q^al'* 

of all items answered correctly (b, , , ) or the final item administered 

high 

^^final^ displayed the lowest reliability. In addition to the Pearson 
product moment correlation coefficient^ Table 3.7a gives Spearman's rho 
which is a measure of reliability insensitive to a monotone trans- 
formation of the score. All findings based on these ^.Jtained values 
of the measure are compatible with those based on the Pearson product moment 

Figure 3.7a is a scattergram ot' the 0 scores obtained for Form A 
vs. Form B. Figure 3.7b is a scattergram of Q and Figure 3.7c is a 
scattergram of 2 (the expected proportion correct over the enti.^e item 
pool). These figures include the five anomalous cases that exhibited 
petjuli.ir response patterns. The three rnost separated points in the 
"northwest" corner of the scatternlots correspond to three o'f the apom^.lous 
c<it;es. For these cases^ Form B scores were coucj-tderably leSo than 
Form A scores. This discrepancy probably occurre d hv ^usi^ Vovia B w.^c 
the second test and these individuals avera*>ed -^s than '-'ij^.ht seconds 
per iteiUj suggesting a loss of attention Intert :>t. 
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Tuble 3,7b summarizes the correlation's between tlie scores from Forms 
A ^\nd B. These correlations and the Figures 3.7d, 3.7e, 3.7f ^nd 3.7g 
indicate the extent to wiiich tho? scores preserve the ranking among 
individuals as ordered by ^ As can be seen from the graphs, U and p 
pre-^erw order exactly. The scores that are based directly on the difficulties 

^^total, ^correct, ^high ^flnal. The relation between these and 
and the maximum likelihood estimate 0 is depicted in the scatterplots of 
Figures 3*7h through 3*7o* As the scatterplots indicate^ these scores 
arc not simple monotone transformations of 0 as are p and but are 
distinctly different from 0* The construct measured by these scores, 
and Its relation to the theoretical ability 0, is an area for further 
research which should be conducted before a determination is reached 
concerning which scores should be used in practice* 

Table 3*7b also presents correlations between the BRTT scores and 
the PSAT verbal score; Table 3*7c shows the frequency distribution of 
PSAT sicores* Figures 3,7p through 3,7cc display the associated scatter- 
plots* The correlations are adjusted for tests of equal length* These 
correlations may be Interpreted as a form of concurrent validity; they 
indicate the extent to which the BRTT and PSAT measure a common con- 
struct* As can be seen from the table^ the correlations betwee^ the 

two tests were reasonably high* and b and b, , , had auione the 

* correct high ^ 

highest correlations with the PSAT* Since a hijh score on the PSAT 
requires that the student answer a large number of items correctly, 
including some of high difficulty^ it Is possible that this relationship 
results from a psychological variable related to accuracy on difficult 
items* This explanation is highly speculative^ but it offers an 
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^ntrlgiiing possibility for future research. Although the BRTT-PSAT 
correlations are high, they are not perfect. This suggests that the 
two tests do not measure exactly the same construct. The difference 
^ay occi'r because the PSAT score Includes items which measure reading 
comprehension skills whUe the BRTT does not Include such items. Also^ 
the BRTT is computer-'presented while the PSAT is a penc 11-and^paper 
Instrument. Finally^ the variation, may be partly attributed to the 
usual differences which characterize items selected for any set of 
parallel tests. 

/in Important question Is that of the parallelism of the two forms 
of the BRTT. Parallelism In this context involves whether a given score 
from I-'orm A was significantly different from the score as computed using 
Form If the two forms yielded different rr.ean values with respect to a 
given ^rore, the tests would have to be equated before individual score 
comparisons could be made- 
Table 3,7d presents paired testis and one-sample van der Waerden 
te^ts ht:Lween Forms A and B for the various scores. A significant test 
statistic indi';ates that the differences between scores were not due to 
chance^ and the two forms cannot be considered parallel with respect to 
the score. All eleven anomalous cases were omitted in these analyses^ 
The one-sample van der Waerden test was performed to Insure that any 
significant t statistic was due to n real difference bfitwcf^n Forms A and B 
and not an artifact due to non-normality In the data. 

Th'^ data presented In this section Indicates that £, or a score 
closely related to It, would be the best choice for the t^RTT. In terms of 
reliability^ £_ with 0 and .i, among the most reliable of the scores 

studied. Ui.likt ''^ and ^'l, £ does not suffer from the Infinity prohl^ dis- 
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cussed in Section 3.9- A student who answers all items incorrectly on 
the BRTT would obtain a £ score of Oj one who answers all items correctly 
would obtain a £ score of 1* la the case of 0 and f^, such individuals 
would obtain inderterminant scores* Furthermore, for this data^ £ is a 
parallel score; whereas 0 and Q are not* This would indicate that -£be 
need for test equating Is reduced when £ is used providing the item pools 
for the alternative forms are comparable with respect to the a^ b> and c 
parameters* 
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Table 3,7a 




Correlation 


Between Fonns 


A and B 


for Score 


at 25!:h Item 


(N=135) 


SCORE 


-^ab 


rho . 
ab 




0 


.8719 


,8585 




.8730 


.8585 


b 

total 


.8247 


.8163 


b 

correct 


,8195 


.8068 


\igh 


.7261 


. 7448 


inal 


.6985 


.6508 


P 


.8732 


.8585 
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Table 3.7b 



Correlations Between Score at 25th Item, 9, and PSAT Verbal 



Score 


■^ae 




t 

'^a PSAT 


PSAT 




N=135 


N=135 


N=92 


N-92 




0 






.8517 (.7547) 


.87A5 (.7749) 


9. 


.9978 


.9987 


.8616 (.7684) 


.8750 (.7803) 


total 


.9717 


.9781 


«335 (.7225) 


.8798 (.7626) 


b 

correct 


.9695 


.9778 


.8323 (.7191) 


.3848 (.7645) 


high 


.9503 


.8828 


.9105 (.7405) 


.90A2 (.735A) 


b.. , 
final 


.84A9 


.91A4 


.8303 (.6623) 


.8655 (.6904) 


P 


.995A 


.9989 


.8575 (.7648) 


.8719 (.7776) 


4- 

' Adjusted 


for at tenuatioii 


(.see Lord 


and Novick (1.967, 


p. 70)). 



The number appearing in parenthesis is the unadjusted Pearson correlation 

roef f ir. it^nt > 
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Table 3,7c 



Distribution of PSAT Scores 



Score 

Interval Frequency % 



71.5 - 76.0 


1 


1.1 


66,5 - 71.5 


9 


9.8 




C 

o 


Q 7 


56.5 - 61.5 


k 


^.3 


51.5 - 56.5 


14 


15.2 


i6 5 - 51 5 


14 




Ul.b - 46.5 


10 


10.9 


36.5 - 41.5 


17 


18.5 


31,5 - 36.5 


3 


3.3 


26.5 - 31.5 


7 


7.6 


21.5 - 26,5 


3 


3.3 


16. 5 - 21,5 


0 


0.0 


12.0 - 16.5 


2 


212 


N 


92 




MEAN 


48. 177 




SD 


13.343 





Minimum Value 12.854 
Maximum Value 75.284 
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Figure 3,7r P5AT vs. Omega (Form A) 
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Table 3.7d 

Paired Tests Between Scores Calculated at the 25th Item 
for Foras A and Forms B o£ the BRTT CN=135) 

Score t-Test (P-Value) One^-Sample Test 

van der Waerden Test C?-Value) 



6 


2.0938 


(.038164) 


4.1465 


(3.3756X10"^) 


Q 


2. 1517 


(.033215) 


4.2343 


(2.2926X10"^) 


^total 


2.0497 


(.042341) 


4.2791 


(1.8761x10"^) 


b 

correct 


1.8789 


(.062434) 


3.9204 


(8. 8402x10"^) 


high 


.34097 


(.39438) 


1.0353 


(.30052) 


^final 


3.4747 


(6.6947X10~^) 


6.41 


(1.4547X10"^^) 


2 


-.84969 


(.39702) 


-1.3169 


(.18787) 



Tests are computed on the difference. Form A minus Form B. 
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3 . 8 Parallelllsm and Reliability Across All Items 

The previous section presented data on the reliability of the BRTT 
for scores computed at the 25th item* Here we consider the reliability 
of the test at all items* 

Figure 3* 8a shovs the reliability of the BRTT as compared to that of 

the ?<>AT at 1^ 2, 25 items* The reliability for the BRTT was 

obtained by correlating the scores for each subject for Forms A and B at 
each step* The reliability of the PSAT was obtained bv adjusting the 
published reliability of the 65 item test to lengths of 1> 2, ***> 25 items 
by means of the Spearman-Brown formula. 

Figures 3* 8b throu,,.i 3,8e show the comparison of the BRTT to the 

PSAT when the scores T: > p> b , and b ^ are used* As with 0, the 

total > correc t ' 

reUability of the BRTT is higher at all points* 

Figures 3*8f -^^ 3*8j show the mean value for the sample for each of 
the scores discussed above* In reading these graphs > note that the solid 
line represents Form A and the broken line represents Form B (in the previous 
graphs the solid line indicated the BRTT and the broken line the PSAT) ^ 
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Figure 3.8b Reliability of Omega Compared to Reliability of PSAT. 
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Figure 3.8b Reliability of Omega Compared to Reliability of PSAT. 
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NUMSER OF ITEMS ADMINISTERED 



Figure 3.8c Reliability of £ Compared to that of PSAT. 
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Figure 3.8f Mean of Sample for Theta 
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Flgure 3.8h Mean of Sample for 
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Figure 3.8i Mean of Sample for b 
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Figure 3,8j Sample Mean for b 

correct 
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The BRTT employs an item bank In which three parameters have been 
previously estimated for each item. These parameters are (aJ discrimination 
(b) difficulty and (c) guessing. The parameters define an item characteristic 
curve which describes the probability of a correct response to the item 
given a trait level It is assumed that trait levels vi^ry continuously 
and that the probability of a correct response to the item Is an Increasing 
monotone function of the trait level. Given tralrtr level 6» the item 
characteristic curves model the probability of observing a particular 
response vector, In the BRTT score^ the problem is reversed: given an 
observed response vector^ we estimate the trait level which gave rise 
to the observed pattern* The procedure used to score the BRTT Is the 
maximum likelihood technique; the likelihood function 13 defined 

by thtr equation: 

k u 1-u 

L(e) = rip^(e) Mi- p^(e)] 
1 

where P^(e) = Viu^ I |e). 

The desired estimate i^f 0 Is the point at which the likelihood function 
is maximized. In general^ the likelihood function assumes the form shown 
in Figure 3,9a, However^ if the examinee answers items correctly* 
the functioi^ becomes asymtotic to +1 at^d assumes the form shown in Figure 3,9b. 
In this case, the maximum is taken at + resulting it^ an estimate for 0 
e^iufi 1 to + Similarly^ if the examinee answers every question incorrectly* 

Ukelihood function will assume the form shown in Figure 3,9c and will 
yieM an MLE of ^, Therefore* the MLE estimate of 0 can only be employed 
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if the examinee's responses produce a likelihood function with maximums 
obtained at a realvalued number* Given that the estimate of 0 is non- 
infinite, the likelihood function will tend to provide increasingly precise 
estimates as the length of the response vector increases. 

Figure 3-9d shows the likelihood functions observed for a single 
individual taking Form A of the BRTT* The item difficulty ad^iinist ered 
and the MLE of 0 is shown at each step* Note that the likel hood function 
assumes infinite values at steps 1 and 2; the estimate of 6 at these 
points is set equal to the difficulty of the iter" P^mi '^stered* The 

estimates become stable fairly rapidly* 

Figure 3*9e shows the analogous graphs for the same individual oa 
Form B of the BRTT. 
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Figure 3 .9e continued 
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3,10 Analysis of the Numerical AlRorlthm for DeterminlnR 9 

We investigated the performance of the system with respect to the 
computation of the maximum likelihood estimation. Several interesting 
results Wire obtained which suggest refinements to future operational 
systems* 

As described in section 3.9, the CAT system uses the statistical 
method of maximum likelihood for estimating an individual's ability 
parameter given his string of responses to the Items chosen by the com- 
puter. The numerical technique employed for determining this ability 
parameter is the Ncwton-Raphson method. The exact mathematical details will 
be deferred for the Pioraent so that we juay focus on two functions of theta 
which depend on the examinee's string of responses* Given the ability 
parameter 0, denoting the probability of correct response to the ith item 

by P (0) and the derivative with respect to 0 of this fuction by P , (0) ; 

1 

define the weight frr item i as 

(1) (0) = (O)/[P^(0) Q^(0)] 

where Q^(0) is the probability of incorrect response given 0 = 1-P^C0). 

Defining u^^ = 1 for a correct response and u^ 0 for an incorrect 
response to item i, we have 

(2) S(0) = Eu^w^(O). 

This is the weighted number right score described in Section 3,3. 

The second function is the expected weighted number correct given 
0 and is defined as 

(3) T(e) . Efs(o)|eK 

The function can be further specified by using the relationship from 
item response theory that states that the expected value of the response 
u^ to item 1 is P^(0) ; hence, 
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(4) T (0) = rp^(0) w^(0) , 

The statistical theory of maximum likelihood estimation reduces in this 
case to finding the value of O thac satisfies the equation 

(5) S(0) = T(0) . 

In other words » che maximum likelihood estimation of the ability is that 
value which best fits the subject**! responses in the sense that the 
weighted number right score is equal to its theoretical expected value. 

Figures 3,10b and 3.10c are plots of the obtained S and T values for 
each examinee from Forms A and B of the BRTT, These figures show equality 
between S and T, The figures verify that the compu*'''* urogram in the ' 
CAT system is accurately finding the maximum likelihuCa estimator. 

The .maximum likelihood estimator of 0 is the value which maximizes the 



likelihood given the response vector u= (u^^^ , * * ^u^^) 



(6) lCo) - n p^Co)^ fi ^ p.ce)] i 

The value of 0 that maximizes I-(6) can also be obtained as the .laximum 
of the logarithm of the likelihood: 

(7) 1(0) = 1 log P^(G) + i^Cl-u^) logtl ^(0) 
It can be verified that the derivati\fe of 1(0) with respect to 0 is 

(8) 1(0) = rUjWj(O) - ^P^(0) Wj(0) . 

From definitions of S(0) and T(0) equations (2) and (^) , we see that 
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this Is equivalent to 



(9) 



1(9) = S(0) - T(0) 



So that the maKlmuia likelihood estljnator which satisfies equation (5) 
equivalently satisfies the equation 

1(0) = 0, 

This is true since the extrema of a function, in particular 1(0), 
may be found, under suitable assumption, by setting the aerivative or 
rate of change equal to 0, This fact follows from observing that in 
Figure 3,10a the line tangent to 1(0) at 0 tias slope equal i(0) and that 
the maxinuim of 0 is obtained ^.hen the line is flat, that is, when the 
slope is 0, 
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Figure 3, 10a 
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3,11 M onte Carlo Analysis of the Item Selection Procedure 

The BRTT chooses items for an individual examinee which best suit 
his or her ability, lu the ideal situation^ items are chosen so that the 
examinee has about a 65% estimated chance of responding correctly and the 
responses to such Items give the most Information about the examinee's ability. 
The BRTT would use Prior information on the Individual to choose all the Items 
to match the examinee's ability so that each response would be maximally 
informative. However* prior to administration of the first item no specific 
information about the individual's ability exists. As responses are accumulated* 
the BRTT obtains progressively better estimates of the examinee's ability so 
that estimates toward the end of the test are more accurate than estimates 
at the beginning. This process was illustrated graphically in section 3,9, 
Becsuse the precision of estimation changes over the course of the test^ 
the BRTT will administer some Items which are too difficult and some which 
are too easy for the particular examinee, 

A rough Idea of how far the actual item selection procedure deviates 
from the ideal case may be obtained by examining the relationship between 
the number ot correct responses and the final estimate of ability. 
Graphs 3,11a and 3, lie display the number ccrrect vs 0^ the measure of 
ability, for Forms A and B, Under the ideal circumstance of having the 
BRTT administer Items such that the probability of a correct response is 
<65; no regression of number correct on the estimate of ability would be 
expected (assun,ing that the final estimate 1^ very close to the examinee's 
true ability). However* the graphs reveal some regression — particularly 
tor Form A, The correlations between number correct and theta are 
,6970 for Form A and -5616 for Tirm 
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One way of studying the correspondence of the assumed model for response 
to the true underlying model is to compare the examinee's actual responFes 
to the responses obtained by simulating each examinee's responses based 
on the estimated 6* For each item actually administered to an examinee 
having the estimated theta> the simulated response was obtained by randomly 
generating a number in the unit interval. If the number was greater than 
P(0)j the response was taken to be incorrect; if it was less than P(6) > the 
response was taken to be correct. If the item response probabilities are 
modeled accurately and the estimated 6 is r^^asonably accurate* the same 
pattern should appear in the simulated number correct as in the observed 
number correct* Graphs 3* lib and 3* lid display the simulations for Forms 
A and The patter-s are not very similar; the relationship between the 
simulated and live data is not the same* 

The findings of the Monte Carlo study suggest the need for additional 
research into the determinants of individuals' responses to items. 
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3- 12 Anomalous Cases 

This section presents all cases which the BRTT system failed to 
process optimallyj and it is hoped that this data can be used for 
improvements in the system. 

As stated previously^ eleven cases were eliminated from the analyses: 
six because the BRTT system reported a £inal 0 of -6 on at least one of 
the formSj three because the Form A to Form B final estimates of G were 
inconsistent^ and two because their Form A estimates of 0 were + ^ after 
the 8th item and - after the 9th item. The second of these three cases 
was caused by loss of attention or fatigue on the second test administered* 
The remaining cases were anomalous due to a feature of the type o£ numerical 
algorithm used by tlie BRTT to determine the maximum likelihood estimator* 

Martha Stocking ot ETS found that a modified New ton -Ralphs on procedure 
could be successful in overcoming the infinite G problem in seven o£ the 
cases* In the remaining case^ the 0 of -6 was a proper approximation to 
the maximum likelihood estimator since the likelihood function, due to the 
examinee's responses, ^ittained the maximum at - The transformation 
of " = "6 to the £ score — the expected proportion correct for the entire 
item pool — is equal to the proportion correct if the examinee had no 
knowledge of the material an^l was guesising- This is a reasonable way 
to li;in de a<lual ma::imum likelihood estimates that equal - 
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Section 3,13 Summary and Conclusions 

This section summarizes the data presented in Chapter HI and high- 
lights a number of significant findings. 

Data log files created by the computerized test administration 
system were subjected to statistical analysis. Nine score variables 
were derived from the data and examined. Three scores of particular interest 
are 6^ the parameter cotmnonly used to denote the latent trait; a 
monotone transformation of theta; and p> the expected proportion of items 
correct over the entire item pool* 

Frequency distributions of the scores revealed that both Form A 
and Form B had roughly comparable distributions* Tests for the normality 
of the sample data revealed some deviations from normality. It was 
decided not to attempt to transform the data to a more normal form since 
it was unlikely that the transformation function would generalize to 
sut^sequent samples. In some analyses^ non-parametric statistics were 
employed to compensate for the lack of an underlying normal distribution, 

A major finding of the study was that the information yield of the 
BRTT closely approximated the simulation results reported by Lord fl977a). 
This result was important because it confirmed that the accuracy of the 
BRTT conformed to theoretical expectation* Although this result must be 
interpreted with some caution^ since the enipirically observed information 
functions were calculated by use of the three parameter logistic models 
the finding Is broadly supportive of the utility of the three parameter 
models In general^ and the design of the BRTT, Monte Carlo simulations 
disclosed some discrepancies between theoretical expectations and 
observation. The simulation results suggest Cuat there is need for ^ 
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13 continued 

better methodology for finding mathematical models that adequately 
describe a subject's responses to individual items. 

A second major finding of the study was that the BRTT proved to 
be highly reliable. The reliability of the 25 item test was .8719 which 
compares favorably with the 65 item PSAT whose reliability was .9111. 
Since the length of the BRTT was only iSZ of the PSAT> this result 
confirmed theoretical expectations regarding the increased efficiency of 
adaptive over conventional testing. 

A third major finding of the study was that p» the expected proportion 
correct over the entire item pool appeared to be the most desirable score 
for general use. Of all scores studied^ 6^ and p exhibited the highest 
reliabilities. However, forms A and B were not parallel with respect to 6> 
and 2. However^ when p was employed^ the scores on both forms were directly 
comparable. 
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Chapter IV 
Studer^t Response to Adaptive Testing 

A. 1 Collection of Attitude Data 

The process of performing an assessment Involves two variables^ 
the Instrument and tho Individual* Chapter III wis concerned with the 
properties of the instrument; this section is focused on the second 
variable — the student* 

Slegel (1969) suggested that attitude judgments should be made 
Immediately after the test session^ because perceptions might be subject 
motivated forgetting^ which would reduce the initial differences in 
perceived validity. For the same reason* estimates of overall test 
difficulty and piobablliLy of success should also be made at that time. 

Following completion of both forms of the BRTT» students were asked 
to complete a posttest questionnaire. One hundred twenty^^four students 
completed the questionnaire (which Is reproduced in Appendix A). The 
remaining students either pleaded fatigue or had insufficient time to 
complete the questionnaire within the allotted class period. 

Attitudes are important to the extent that they affect performance. 
Weiner (1937) found tliat examinees with distrustful attitudes had 
impaired performance on the WAIS picture completion and similarities 
subtests. It was thought that the distrustful comments made by the 
examinee interfered with his ability to make the correct r&sponse. 

I. Sarason (1972) and Wine (1971) have suggested that attitudes 
affect anxiety level and performance by distracting th^ examinee^s 
attentlonal focus from task^relevant variables. 

In the current study » the questionnaire employed included items 
designed to determine students^ prior familiarity with computers (Koch 
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and Patience, 1977); subjective perception of difficultly, anxiety, and 
tootivation (Prestwood, 1978); factors in the human/computer interface 
(Alderman, 1978); preference for adapcive vs conventional testing; and 
feedback or knowledge of icSulLS (Prestwood, 1978). The questionnaire 
also\ioIicited student opinion on the best and least liked factors in 
the ada^ive test (Schmidt, Urry, and Gugfil, 1977). These latter topics 
employed free-response incomplete sentence blank format; all other items 
were multiple-^oice anc were adapted from the studies referetced above. 

Since student att itudes are likely to vary as a function of the 
perceived importauceNDf the test, and the test results did not affect the 
studencs lives, the daO^ reported must be viewed with caution^ In 
particular, the levels of r^orted anxiety may be lower than that experi- 
enced in a "live" testing situX^^^^ri (Koch and Patience, i977)- 

The section which follows suimbari^es previous work on student 
artitudes to computerized testing. Subsequent sections present the 
results xor the specific variable^ measurexl by the questionnaire. 

4.2 Previous Research on Student Attitudes to G^mputerlged Testing 



There is a notable lack of literature investiga^ng examinees" 
dcti^tudes toward testing, computers, and computerised t^^^ng; but 
findings generally suggest f vorable attitudes toward computerised 
adaptive testing* Aiil (1975) onciuded from a Creative Computinj^ Magazine 
survey tnat most Americani* hav<i a generally positive attitude towardV 
computers and that two-thirds of the population has a fair understanding <?f 
the computer's role and functLon. Cartwright and Derevensky (1976) found 
that teacher educatiuLi graduate students exposed to computers-assisted 
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testing had more favorable attitudes toward fiomputer-assisted instruction, 
toward programmed instruction, and toward the lectures than students not 
exposed to OAT, They suggested that positive experience with computerized 
tests can modify previous negative attitudes, Betz and i^eiss (1976) 
found :hat for high-ability students^ motivation wa^ high on both stradap- 
tive and conventional tests administered on a cathode ray terminal, 
while for low ability students it was high only on the stradaptive test, 

Schmidt^ Urry> and Gugel (1977) investigated attitudes of 163 
individuals who took a computerized adaptive test of verbal ability and 
found an overwhelmingly positive response, Eighty-three percent of those 
responding preferred the adapti\'e test to pencil -and -paper testing; 69% 
felt the adaptive test was more fair than a conventional test* Only 10% 
of those responding indicated a preference for pencil-*and-paper testing, 
and 42 felt that the adaptive test was less fair than a conventional 
test* 

The features most liked about adaptive testing were: reduction in 
total test time (35%), simplicity of administration (19%), lack of time 
pressure (13%), and potential for quick feedback (10%)* The features 
least liked were: the inability to review and change previous answers 
(23%) at;d difficulty in adjusting to this method of administration (20%) 
(Schmidt, UiT-y^ and Gugel, 1977)* 

Hedl, O'Neil, and Hansen (1973) investigated preference for computer- 
administered vs examiner-^administered intelligence tests* These investi- 
gators found that the computerized tests elicited higher anxiety and less 
favorable attitudes than the examiner-administered tes;:* However, these 
results were probably due to tne fact that the computerized protocol 
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(which was nonadaptive) required that all examinees complete the entire 
test, wiiereas the exatalner-admlnlstered test was terminated after the 
Individual failed 10 consecutive items. Given the massive failure 
experience, the resultant negative attitudes are not surprising. It Is 
interesting to note that failure feedback on an "intelligence test" is a 
standard procedure for experimentally Inducing test anxiety (Levitt, 
1967). 

Koch and Patience (1978) Investigated student attitudes toward 
tailored achievement testing. The variables they measured were (l) time 
pressure, (2) perceived test difficulty, (3) test anxiety, (4) prio' 
experlence with computers, and (5) overall preference for computerized vs 
convent lonal testing. 

These Investigators compared attitudes under two rlrcumstances; <l) a 
condition In which the test did not count toward the course grade, and (2) 
a condition In which tae test did count. Results Indicated that students 
felt significantly more time pressure and anxiety under circumstances in 
which the test counted toward tne final grade, but no significant differ- 
ences were found with regard to perceived test difficulty or overall 
preference for adaptive vs conventional testing. 

A major advantage at conaputerlzed test administration is the ability 
to provide the examinee wlti . eedback or krowlodge of results (KR) . 
Research on KR In an adaptive testing envlroniaent har been conducted by 
Betz and Weiss (I976d, b). Pine (1977), and PreStwood (1977). These 
findings wllJ be dlsciisyed In section 4.5, which is concerned with feedback* 
It appears that KR affecti^ a nuinbv^r ot attltudirxal variables- Betz anc" 
Weiss (1976a, b) found that accurjtcy of perception of test difficulty and 
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Dootivation were higher for students in a KR condition than for students 
who received no feedback* 

A* 3 Prior Famiiiarity with Computers 

Some individuals in our society have expressed distrust and unease 
with computer technology. To the extent that such attitudes adversely 
affect test performance they are of interest in adaptive testing, 
feCoch and Patience (1977) have suggested that items tapping prior f* liliarity 
with computers can serve as useful covarlates in analysis of subseqi^^int 
questionnaire data. 

The current study employed three items adapted from Koch and Patience's 
(1977) study. These items asked if the student was at all familiar with 
computers^ had used a keypunch or terminal before. The responses to 
these items are presented in Table A. 3a. As can be seen from the table» 
about half the students reported some familiarity with computers and the 
same number had interacted with computers by means of a terminal. 
Students were more familiar with computer terminals than with keypunch 
machines^ a finding that may be surprising to those whose familiarity 
with computer systems was acquired during the time when '^batch** rather 
than interactive systems were predominant. 

A. A Perception of Test Difficulty, Anxiety, and Motivation 

The variables considered In this section are iniportant deter. Lnants 
of an Individual's performance. The importance of anxiety as a factor in 
task performance has been of major concern since Handler and Sarason's 
(1952) seminal article on test anxiety. A review of the test anxiety 
literature is beyond the scope of the present work; however » anxiety has 
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lable 4.3a 

Prior Familiarity with Computers. Total Sample (ll«124) . 

Item Text of Item Response to Options 

N % 



14. Are you at all familiar with computers? 

Yes 60 48% 

No 63 51% 

Omit 1 1% 

15. Have you ever punched computer cards at a keypunch machine before? 

Yes 44 35X 

No 80 65% 

16. Have you ever Interacted with a computer by means of a terminal before? 

Yes 65 52% 

No 59 48% 
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been repeatedly shown to adversely affect performance at all levels of 
academic experience (Gaudry and Spielberger^ 1971)* Many theorists have 
noted that, since anxiety can serve as a learned drive, it can serve 
as a motivational variable and will interact with other motivational 
variables* One of the most relevant o£ the motivational theories is 
Atkinson's (19S8) model of fear of failure, need for achievement, and 
their interactional effect on risk-taking behavior* 

Most risk-taking experiments investigate riijk-taking as a voluntary 
action (a dependent variable)* In general, individuals with high motive 
to approach success prefer to answer questions at an intermediate Isvel 
of difficulty, while individuals with high motive to avoid failure pmer 
to answer questions that are either very easy or very difficult (Atkinson, 
1958)* In an adrpttve test, however, the level of uncertainty is imposed 
by the selection algorithm* Disregarding guessing, all the questions are 
aimed at the *5 probiibilitj' of success for a given examinee; these are 
precisely the type of q^iestions that the examinee with high fear of 
failure would tend to avoid* High motive to avoid failure has been 
equated ^^Ith high test anxiety (Atkinson and Litwin, I960)* Consequently, 
an adaptive te^t might be streasful for a high-anxious testee who would 
be predicted to experience maximum avoidance tendencies on items which 
have a *5 probability of 3u<;cesd* Anxiety caused by these avoidance 
tendencies might bfe hypcthestzed to adversely afft^ct performance* In 
particular, it could encourage Impulsive responding In order to simply 
remove the anxiety -^producing .^itimulus; or it may lead to reflective 
rss'ponije^ when the examinee Is unable to choose among competing responses 
(Spence, 1964)* Ilcwevor, Atkinson (1958) predicted that when there is no 



choice of level of task difficultyj performance should be optimal when 
the probability of success Is *5 regardless of whether the motive to 
a lieve or the oiotlve to avoid failure is stronger* This is hypothesized 
to occur because both the motive to achieve and the motive to avoid 
failure would be greatest at the 50% point, and both effects would 
summate, resulting In maximum motivation to perform* While this theory 
predicts maximum performance on the basis of maximum motivation to 
perform, It is possible that anxiety caused by such a situation would be 
sufficient to interfere with the positive relationship between motivation 
and performance (Wine, 197?)* 

Atkinson's theory suggests that adaptive testing might be more 
stressful to Individuals? with high test arixlety than conventional tests* 
However, a conventional ;:est which Is too difficult for an Individual is 
likely to result in a high proportion of guessing* Guessing, because of 
Its high random component, reduces the roeasuriijnent accuracy of a test* 
The problem Is even more complex when a student guesses as a result of 
partial knowledge which permits one or more dlstractors to be eliminated 
(Lord and Novlck, p* 303)* 

While only a small portion of testees generally find a test to be 
entirely too difficult, a considerable portion of testees experience 
short groupings of overly difficult questions* After a failure experience 
with difficult problems, the examlii^ee may develop an Impulsive response 
set which may lead to errort* on subsequent Items of more appropriate 
Items* v/alker, Netlson, and Nlcolay (1965) found that under stress 
conditions (caused by failure at a previous task), anxiety was negatively 
correlated with intelllgenre test performance* Hedl, O'Weil, and 
Hansen (t'^H) found that subjects had greater anxiety and more negative 
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attitudes toward computer-based testing after a massive failure experience 
with items that were too difficult. 

Clearly the issues of the examinee^s anxiety, motivation, and 
perception of test difficulty are important and their effect on an 
individual's performance are complex. Unfortunately, it ts difficult to 
obtain data on these issues during an important testing session, since 
the introduction of exper' nental interventions during testing is rarely 
possible. Since the data collected in the present study were obtained 
from student volunteers in an experimental context, they must be regarded 
with caution. Ho.^ver, because of the importance of these variables it 
was decided to collect relevant data. To facilitate comparison of these 
data with related research, items were adapted from previous studies. 

Items n^ll on the questionnaire were concerned with the student's 
perception of the difficulty of the adaptive test. These items were 
ac^ ted from ^ stud> by Pine (1977, personal communication) and are 
reported ^ table 4.4a. 

Items 17 and 18 were concerned with the students^ perception of the 
appropriateness of the difficulty of the t^st. Inspection of the response 
distributions reveals that almost none of the students found the test 
items always or frequently too easy. Ninety percent of the students found 
them sometimes or seldom too easy. Eighty**five percent of the students 
found the items too difficult sometimes or frequently; however, only 19% 
of the students indicated that th^f guessed more than half the time. The 
picture that emerges from these responses is that the students generally 
felt that the test was appropriate to their ability level, although 
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somewhit on the difficult side. Very few students found the test excep- 
tionally easy or difficult. 

Item 21 asked the students to rate the difficulty of the test, 
overall. In relation to their ability. Half the students felt that the 
test was ''just about right" while most of those remaining found it 
''somewhat too difficult/' 

It is interesting to speculate to what extent the students' expec- 
tations of success influenced their judgment of difficulty* Students 
of high ability generally achieve high number-right scores on conven* 
tlonal tests while those of low ability generally obtain low number-^ 
right scores* In the adaptive test> the number-bright score should be 
unrelated to ability level. In any case students generally expressed low 
levels of frustration with the test. 

Table ^►^b presents the response distribution for items related to 
the students' self-reports of anxiety during testing* Generally, students 
reported moderate levels of worry; but few (3%) felt that anxiety 
unquestionably prevented thetn from doing their best> while 12% felt that 
anxiety might have affected their scores some^rhat* Prestwood (1977) 
found that examinees who took tailored tests which counted toward a 
course grade reported higher levels of anxiety than those whose tests 
did not count. It ts probable> therefore* that the r<*sults of the 
current study are not indicative of the anxiety levels which would occur 
if the test wert importrint in the individual's academic career* 

Table 4,4c presents ihe response distributloivs for a series of 
items adapted from Pine (1977)> designed to measure motivation* As 
can be seen from the re^iponses to items 28 and 29> about 30% of the 

i' > 
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Table 4,4a 

Subjective Perception of Difficulty, Total Sample (N"124) , 

Item Text of Item Response to Options 

N % 

17, How often did you feel that the questions in the test were too easy for 
you? 

Always 0 0% 

Frequently 6 5% 

Sometimes 65 52% 

Seldom 47 38% 

Wever 5 4% 

Omit 1 1% 

18, How often did you feel that the questions in the test were too hard for 
you? 

Always 1 \% 

Frequently 35 28% 

Sometimes 71 57% 

Seldom 17 14% 

Never 0 0% 

19, On how many of the questions did you guess? 

Almost all of the questions 1 1% 

More than half of the questions 4 3% 

About half the questions 18 15% 

Less than half of the questions 55 44% 

Almost none of the qi*estions 46 37% 

None of the questions 0 0% 
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Table 4^4a (Continued) ^ 

Subjective Perception of Difficulty, Total Sample CN=124) , 
Item Text of Item Response to Options 



N % 



20* How often were you sure that your answers to the questions were 
correct? 

Almost always 2 2% 

More than half of the time 31 25% 

About half of the time 54 44% 

Less than half of the time 33 27% 

Almost never 3 2% 

Omit 1 1^ 

21* In relation to your vocabulary ability^ how difficult was the test 
for you? 

Much too difficult 1 1% 

Somewhat too difficult S3 43% 

Just about right 65 52% 

Somev7hat too easy 4 3% 

Much too easy 0 0% 

Omit 1 1^ 

22* Dia you feel frustrated by the difficulty of the test questions? 

Not at all 39 31% 

Somewhat 79 64% 

Fairly much so 6 5% 

Very much so 0 0^ 
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Anxiety, Total Sample (N»124)* 

Item Text of Item Response to Options 

N % 

23* During testings did you worry about how well you would do? 

Not at all 33 27% 

Somewhat 64 52% 

Fairly much so 20 16% 

Very much so 7 6% 

24, Were you nervous while taking the test? 

Not at all 81 65% 

Somewhat 31 25% 

Moderately so 12 10% 

Very much so 0 0% 

25* How did you feel while taking the test? 

Very tense 1 1% 

Somewhat tense 14 11% 

Neither tense nor relaxed 43 35% 

Somewhat relaxed 43 35% 

Very relaxed 23 19% 

26* Did nervousness while taking the test prevent you from doing your best? 

YeSj definitely 4 3% 

Yesj somewhat 15 12% 

Probably not 71 57% 

Definitely not 34 27% 
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Table 4,4c 
Motivation, Total Sample <M"124), 



Item Text of Item Response to Options 

M % 



27, How frequently were you careful to select what you thought was the best 
answer to each queation? 

Almoat always 66 53% 

Frequently 36 2?% 

Sometimes 20 16% 

Rarely 2 2% 

Mever 0 0% 

28, Did you feel challenged to do as well as you could on the test? 

Hot at all 8 6% 

Somewhat 41 33% 

Fairly ouch so 40 32% 

Very much so 35 28% 

29, Did you care how well you did on the test? 

I cared a lot 39 31% 

1 cared some 62 50% 

I cared a little 14 11% 

I cared very little 7 6Z 

I didn*t care at all 1 i% 

Omit 1 1% 
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students expressed high levels of motivation* Half of those taking 
the test Indicated that they were careful to cuoose the correct answer 
almost every time. As with anxiety. It would be Interesting to compare 
these results with ciata obtained following a testing session of personal 
significance to the student* 

4.5 The Role of Feedback (Knowledge of Results) 

It Is relatively simple to provide Imi&edlate feedback In an adaptive 
testing system* However, because the effects of feedback on perform- 
ance appear to depend on complex Interactions, It is not clear under what 
circumstances feedback would facilitate or Impair the perforinance of the 
examinee* 

Betz Weiss (l976a, b) studied motivation, anxiety, and perform- 
ance as a fi' ion of provision of knowledge of results (KR) for highl- 
and low-ablllty examinees on adaptive and conventional tests* Hlgh- 
ablllty examinee^, overall, report€:d more motivation than low-ablllty 
examinees* KR resulted In Increased motivation for hlgh-ablllty examinees 
and decreased motivation for low-ablllty examinees* Furthermore, motiva- 
tion was higher on ths conventional test for hlgh-ablllty examinees 
(where KR was probably mostly posj.tlve), and higher on an adaptive test 
for low-ablllty examinees (where KR was probably more positive than for a 
traditional test)* In contrast. Means and Means (1971) found that 
hlgh--ablllty students performed better with negative KR; and low-ablllty 
students performed better with positive KR* However, In this case KR was 
gJven after the entire test* Item by Item, KR Is psychologically quite 
different from posttest KR* For example, a student who receives negative 
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feedback on the first few items of a test may give up at the beginning* 
If the KR la provided after the test, the examinee may be motivated to 
achieve a higher level of performance on a subsequent test. This distinc- 
tion la supported by Locke et al* (1968), who .'ound that the motivational 
effects of KK depend on the goals the examinee sets In redponee to the 
KR* 

\Bet7* and Weiss (19/6a, b) also found that hlgh-ablllty examinees 
reported less anxiety than low-ablllty examinees on the aame type o£ 
test* KR produced higher anxiety for the adaptive teat and lower 
anxiety for the conventional test for both ability groups* Hansen 
U974) found that high-anxloua testees made ta^re errors with feedback 
than without tt* Welner and Adams (1974) found evidence that failure 
and the anxiety It can indac^ loay lead to txtore refleCilve responding on a 
matching familiar flgurea test* Hansen (1974) also found that while 
feedback b Iped the performance of high -reasoning teatees, it Impalrea 
the performance of low^reasonlng testees* 

Tlie most striking finding of the iJeLz and Weiss (1976a) study was 
that KR led to significant increases in test scores for the total 
group of examinees* KR yielded greater performance Improvement on the 
Conventional, as compared to the adaptive, test, 

Preatwooci (19/?^ studied the effects oj& KR on 561 undergraduates 
using a modified stradaptlve algorithm which yielded tests of high 
(40% correct), medium- (60% correct), or Icw-dlf f Iculty (80% correct). 
Three conventional peaked tests were constructed to ylftld comparable 
mean number-right scores* This study tailed to replicate Betz and 
vleiss^s (19/6a) finding of better perforciance In the KR condition* 



Table ^;.5a 



Peedl ,k. Tot^ Sample Cn=124) 



Item 



Text of Item 



Response to Options 
N % 



34. Would getting feedback on the test make It: 
More Interesting 
Less interesting 
Cannot say 



95 
4 
25 



3% 
20% 



35, Would getting feedback after each question make you nervous? 



Very nervous 
Somewhat nervous 
Slightly nervous 
Hot nervous at all 
Omit 



17 
29 
45 
32 
1 



14% 
23% 
36% 
26% 
1% 



36. How would you feel about getting feedback? 

I muld rather not know whether my answers 

were right or wrong. 14 11% 

I really don't care if I got feedback or 

not. 5 4% 

I would like getting feedback after each 

question. 33 27% 

L would like feedback at the end of the 

test. 72 58% 
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Nor did Prestwood find higher anxiety on the adaptive te3t a& reported 
by Betz and Weiss* 

It appears as though the effects of KR on pprformance and attitude 
are extremely cuiaplex* Strang and Rust (1973) pointed out that In 
order to test the effect of KR, It Is necessary to control for Intrinsic 
KR of ongoing activity* If the examinee can estljnate performance, KR 
will be redundant* Controlling for Intrinsic KR, Strang and Rust 
found that the examinees were more nervous with Immediate KR than with-" 
out It* Betz and Weiss (1976b) found that even though examinees taking 
the adaptive test were more nervous with KR, 90% of all examinees In 
the study said they liked the provl*3ion of KR* The actual proportion 
of positive KR was related to :»ttltude toward the test; the greater 
the proportion of positive (CJR, ^^be mnv^ favorable the fixamlTiee^s attitude 
toward the test* Prsstwood (1977) also found that a high proportion of 
examinees liked having KR* 

In the current study, KR was not provided, although the design 
of Lhe CAT system makes piiovlslon of both Item and total test feedback 
a simple matter* The reason for not employing feedback was that the 
goals of the study were to explore the performance of the BRTT, and 
feedback would have adued a confounding variable* 

However, It was decided to Include three Items to which students 
could respond by Indicating their preferences regarding feedback* 
These Items were adapted from Prestwood's (1977) scale and are presented 
In Table 4*5a* Three-qaarters of the, students felt that getting feedback 
weuld make It more Interesting* KlDety-nlne percent of the students in 
Prestwood's study felt that feedback made the test more Interesting* 
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Although 76% of the students In Prestwood's study felt that feedback 
after each item did not make them nervous, only 26% at the subjects In 
tl-e present study agreed; l4% felt that item feedback would make them 
very nervous* 

4,6 Human Factors In the Computer/Human Interface 

The ease with which the student Is able to interact with the 
computer Is a critical factor In the testing experience* In the design 
of the computer system employed In the present study, a careful effort 
was made to develop a simple Interaction protocol which would be 
Intuitive In operation and low In fatigue* Two factors In the design 
were the development of a customised keypad to eliminate the need to 
**hunt and peck** on a typewriter keybo^id and the design of terse instruct 
t Ion sets which were employed aftsr the i^tudeut had been exposed to the 
complete, verbose Instructions for a given Item type* The use of terse 
instructions reduced the reading load r£,qulred* 

Table ^*6a presents the results of two items (adapted from 
Alderman, 1978) dealing with the hjman/coinputer Interface and fatigue* 

4 * 7 Preference for Adaptive vs Conventions l Testing 

Table 4*7a Indicates that a majority of students who had a pref- 
erence would prefer adaptive to conventional testing* It Is difficult 
to determine how seriously this preference would transfer to an actual 
testing environment; the novelty and experimental nature of the study 
are doubtless biasing ractors* However, the lack ol a strong negative 
response to this question and others (such as difficulty, anxiety, and 
motivation) suggests that students will be receptive to adaptive tests 
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Table 4.6a 
HuHwn Factors. Total Sample CN-124) 



Item Text of Item Response to Options 

N % 



30. Did the mechanics of using the computer terminal Interfere with your 
taking the test: 

Not at ai: 78 63% 

Slightly 34 27% 

Somewhat 10 8% 

Very niucb so 2 2% 

31. How tiring did you find tJ e computer-administered test? 

Very tiring 3 2%' 

Some'rfhat tiring 15 12% 

Slightly tiring 46 37% 

Not tiring at all 60 48% 
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as coGtpared to ccqventional tetits* The free responses to the incom- 
plete sentence blanks which follow may help clarify factors which 
affected the st'idents' response to the adaptive testing situation* 



Table 4 ,7a 

Preference for Adaptive vs Conventional Testing 
Total Sample CN-124), 



Iteffl 



Text of Item 



Response to Options 
N % 



32, Compared to a "paper-*and-pencll'' multiple-choice test of the same 
length would you: 

Find the computer test more tiring? 13 10% 

Find both tests about the same? 41 33% 

Find the paper-and-pencll test more tiring? 69 56% 

Omit 1 1% 

33, If you had a choice^ would you prefer to take the PSAT as: 

A computer -administered test 58 47% 

A pencll-and-paper test 39 31% 

No preference 26 21% 

Omit 1 IX 
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Table 4.7b 
Responses to Open-ended Items 

I* What did you like best about the computer-administered test? 

1. More Interesting to take* 

2, Less Intimidating* 
3* Less time consuming* 

4. Not as tiring as paper-and-pencll test* 

5* I prefer not looking ahead to other questions* 

6* Easy to use and understand* 

7* Less chance of making errors* 

8* Taking the test alone (or with 1 or 2 others) Is more private^ so 
you feel less pressured* 

9* I liked not being rushed^ and beln^ able to take my time. 

10* I liked not being Interrupted by a Proctor* 

11, The directions were clear. 

12* There were no essay or math queatlons* 

IT' The thing I like least about this method of administering an examination is: 

20. Mot being able to go back to the previous question/answer to either 
review or change* 

21* Some of the lettera were difficult to read^ making concentration 
harder. 

22* Mot knowing your score on the test* 

23* The delayed time between the anawer and the next question. 

24* It waa distracting for each question to be printed out letter by 
letter* 

2^* The computer took aome time to get ready* 
26* The directions could have been a lot shorter* 



Table 4,7b (continued) 
Responses to Open-ended Items 

27, The test was tlane^consuaing and boring, 

28, The screen bothered your eyes at times (possibly causing headaches), 

29, Just staring at the screen became annoying, 

30* The questions seemed harder to me than a written test* 

31, Haphazard guessing was caused due to the faster, more relaxed test, 

32, Felt I was being rushed, 

33* Became reckless hy the end of the test, 

34, Only using 3 keys on the keyboard, 

35» Having to press the *'enter" key twice when you had no changes,' 

36, Takes time adjusting to use the computer, 

37, Computers make me more nervous than the pencil-and-paper test, 

33, Computers are being too widely used, 

39, Not private enough, someone could read your answers if desired, 

40, The location of the computers, 

41, The uncomfortable chair, 

42, Worried if something could go wrong with the computer, 

43, T found the test too easy. 

44, I found nothing wrong with the test, I really liked it, 

III. Changes to consider about this method of administering and examination: 

50. The inability of the user to review his ans^jers and Lo make changes 

if necessary, 

51. Being able to tel) the student<i the cc^^its of the test are, 
52 p To see the whole t^st toj^ethf^r eit ^.he ^ad i-he test. 
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Table 4.7b (continued) 
Responses to Open-ended Items 

53. The computers response should be quicker. 

54. The repeating of Instructions. 

55. After each question completed, there should be a print -out to how 

many questions have been completed and how many left* 

56. The visibility of the print-out should be more legible. 

57. The letter '*G'\ 

58. Use a different letter style for the computer's print-out. 

59. The whole question should appear at once on the screen, not prlntlng- 

out letter l>y letter. 

60. Getting practice using the computer before taking the test. 

61. Don*t believe ^.n "Practlve-test", the test should count. 

62. The test should be taken In total privacy. 

53. Questions dealing with different subjects should be considered. 

64. Use the same kind of questions In a consecutive order. 

65. The buz:£lng noise. 

66. The chalrl 

67. The seating arrangement. 

68. The place. 

69. The machine should make some noises. 

70. A way to make the student feel more relaxed. 

71. Nothing has to be changed. 
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Chapter V 
Implications and Recommendations 

5* 1 Overview 

The current study has demonstrated the viability of adaptive testing 
In the high school environment* From an operational point of view, the 
system performed reliably and students found Interaction with the computer 
l^rminal to be a simple task* Psychometrlcally, the performance of 
Broad-Range Tailored Test was generally consistent with theoretical 
expectations* A summary of the major findings of the study will be found 
In section 3*13* The present chapter considers the Implications of the 
study for future development efforts* 

The chapter is organized arourd four recommendations* The 
recommendations are: 

1* That the organization collaborate with an Interested clleat to 

develop an adaptive test for use in an educational setting* 
2. That the potential for microprocessor-based systems for the 

delivery of adaptive testing be evaluated* 
3* That extensions to Item response theory and the development of 
alternative models for the provision of adaptive testing be 
explored * 

A* That high priority be accorded the development of innovative 
assessment strategies for computer presentation* Such items 
might Involve simulation and gaming, constructed responses, 
graphics , motion, sound, and ^ irae -dependent respon^^s* 

The four recommendations I'^^'oWe both development and research 
components* Underlying these four recommendations is the beliei that 
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adaptive testing has reached a state of development in which its practical 
applications can be seriously contemplated* Although many areas^^exist in 
which additional research is needed, research agendas will benefit 
from the experiences of developing an operational system* A second 
belief which underlies the recommendations is that the development of 
operational adaptive testing will be facilitated by collaboration among 
educators, technical staff, test development specialists, and psychometricians * 

Sections 5*2 through 5*5 elaborate upon the four recommendations 
enumerated above* 

5*2 Recommendation 1: That the organization collaborate with an interested 
client to develop an adaptive tgst for use in an educational setting * 

Recommendation 1 proposes that the organization continue its work in 
adaptive testing with the development of a test for use in an educational 
setting* Although all of the research problems related to adaptive 
testing have not been solved, there is much to be learned from a modest 
operational project* Since the present study as well as previous studies 
have supported the major theoretical predictions of adaptive testing 
models, the development of a valid and reliable opera !onal instrument is 
a reasonable goal* 

In selecting an operational project, several factors will be important* 
A creative, flexible project team and a well-^def ined set of goals will be 
crucial to the success of the project* The project should be one in 
which the adaptive test fills a need which cannot readily be met by 
conventional paper*and-pencil testing* The project should be modest in 
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its goals aad be desigaed to facilitate future development efforts 
through the collection of data appropriate to both formative and summative 
evaluation* 

A number of areas appear to hold promise for adaptive testing* 
In the cooimunity college environment, for example, adaptive testing could 
be used in conjunction with walk-in registration* Students desiring to 
take English or mathematics courses could respond to a relatively short 
test administered on a computer terminal which would determine the 
appropriate placement for the student* Placement testing could be 
integrated with the registration procedures thus providing an effective 
means of tracking large numbers of part-time students* 

A second area in which adaptive testing appears to hold promise is 
that of diagnosis* In order to be effective, diagnostic batteries must 
be comprehensive in scope* When a large number of characteristics are 
to be tested the number of items which must be administered can bet^ome 
unreasonably large* Because of its efficency, adaptive testing would 
be an effective alternative to the administration of batteries of conven- 
tional tests* Multistage procedures in which routing tests are usad to 
perform gross discriminations and branch individuals to more molecular 
assessment segments are feasible* Diagnostic testing mig^^.t be used with 
special populations such as the disadvantaged or In special education 
settings* In conjunction with remedial programs such diagnostic testing 
could be Integrated with instructional modules to create a comprehensive, 
automated mastery learning environment* 
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At the present time the use of adaptive testing for selection should 
be approached with caution* The advent of truth-in-testing legislation 
may place constraints upon the security of the item pools used in adaptive 
testing* This is conceptually no different from the problems facing 
other testing programs* One solution might be to employ very large item 
banks which could be published; the number of items in the pool would 
have to be sufficiently large to prevent an individual from memorizing 
the responses to the entire pool* However, this technique would require 
considerable quantities of direct^^access storage and may be impossible 
to implement on current microprocessor^^based systems* 

The program Dianager must have an understanding of measurement, 
computer technology, and educational practice* Oi.e important function 
the program manager would serve is to facilitate the conceptualization of 
the pcoject in terms which are meaningful to both the client and the 
project staft* As coordinator* communicator, and facilitator* the 
program manager would maintain the project's momentum and direction* 

The program manager must have sufficient technical expertise to 
manage the technical aspects of the project* It is important that s/he 
te able to evaluate technical alternatives and be able to communicate 
witr both technical and nontechnical individuals* A lack of communication 
between technical and subject-matter experts i5 a common cause of frustra- 
tion and failure in projects of this nature* 

The four specialists working on Level 2 will share responsibility 
for design and implementation of the system* Ihe technical specialist 
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would be responsible for system design, programming, and hardware selection* 
Because technical subtleties are often mysterious to nontechnical Individuals, 
the project team will place heavy reliance upon the technical specialist 
for support; this Is especially Important If the technical leader and 
technical specialist roles are combined* 

The test development specialist would be responsiblle for the test 
specifications and items tor the adaptive test* This Individual should 
be knowledgeable In Item characteristic curve theory* A test development 
specialist who Is reasonably familiar with the capabilities of computers 
would tend to be more creative than one to whom computers are unfamiliar* 

The role of the statistical/psychometric staff member Is a crucial 
one* He or she would be responsible for designing the mathematical 
foundation on which the test Is constructed* These tasks Include: 
determination of the selection-algorithm, development of the Item pool 
structure, calibration of Items, choice of stopping rule, selection of 
numerical analytic techniques, determination of score transformations, 
developmen of equating methodologies for alternate forms, and the 
conduct of simulation studies to validate the performance of model* He 
or she should also be well versed In Item characteristic curve theory and 
should have a good knowledge of previous research In computerized adaptive 
testing* It would be helpful If he or she had some programming background, 
particularly In the area of numerical analysis* 

The Senior Reseatch Assistant would be responsible for maintaining 
item files, helping prepare the system documentation, assisting in the 
preparation of user manuals, and generally providing support for the \flde 
range of administrative functions which a project of this nature requires* 
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5*3 Reconnoeadatlon 2: That the poteatlal for microprocessor-based 
systems for the delivery of adaptive testing be evalt,ated* This 
evaluation should Include three models: a tlme^-shari ag model; a 
staad-aloae microprocessor model; and a aetvork model la which a 
host mainframe computer supports a network of microprocessors * 

Over the last cecade there has been a tremendous increase in 
the sophistication of desiga and fabrication techniques for electronic 
circuitry* The class of circuits which have resulted from these new 
technologies, known generically as microelectronic components, have been 
used as a variety of applications ranging from digital watches and 
hand-held calculators to electronic computers and satellites* Micro- ' 
electronic circuits are characterized by a high degree of Integration* 
Thousands of transistors and other circuit components are fabricated on a 
thin silicon wafer or "chip'* ^nose measurements are typically *16" x *22" 
(Noyce, 1977)* It Is difficult to overestimate the Impact of microelec* 
tronlc developments on computer technology* The microelectronics revolu- 
tion is far from over and the technology is advancing at an unprecedented 
rate* 

To appreciate the magnitude of the size and cost reductions which 
have occurred as a result of microelectronics, compare ENIAC (Electronic 
Numerical Integrator and Calculator) — the first electronic computer with 
a typical microprocessor* Designed by Eckert and Mauchly at the University 
of Pennsylvania and operational in 1946, ENIAC weighed 30 tons, required 
150 kilowatts of electricity, and contained In excess of 20,000 vacuum 
tubes* ENIAC was capable of multiplying two 10 digit numbers In 0*003 
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seconds; Its memory size was 150 locations* Reliability was a major 
concern. With over 20,000 vacuum tubes In ENIAC, the rate of random tube 
failure approached the time required to locate and replace the malfunctioning 
tube* This posed a serious probletn to future development since It was 
hypothesized that computers using larger numbers of vacuum tubes would 
rapidly approach zero operational time due to the large number of failures 
expected* 

In contrast, consider a typical microprocessor chip, INTEL Corpor- 
ation's 8085 Microprocessor* The 8035 Is fabricated on a single chip 
tfhlch measures *164" x ,222'** The chip contains 6,200 transistors and Is 
capable of decoding over 300 instructions* It can execute 770,000 
Instructions per second* The manufacturing cost of an 8085 chip Is 
measured In pennies; Its retail cost Is several dollars. The chip Is 
rugged, reliable, and may be powered by batteries. 

It is evident that the computer Is no longer an expensive laboratory 
device and the availability of microprocessors profoundly alters the 
cost/benefits of applying educational technology to the classroom* Cost 
reduction trends are expected to continue as manufacturing technology 
becomes Increasingly sophisticated* As Figure 5,3a shows, the number of 
components per circuit has doubled every year jince 1959, Figure 5,3ib 
shows the acLual and projected decline of the cost/bit of computer memory 
for the y^^.ars 1973-83, 

At the time that the CAT system was designed, general purpose micro- 
computers were not readily available* For this reason, the CAT system was 
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designed to run upon a conventional tlnie-sharlng systeiB* Figure 5*3c 
Illustrates the structure of the tlnie-sharlng system employed for the CAT 
project* As can be seen from this illustration, a single central computer 
services multiple students using a single set of items maintained on magnetic 
disks* Communication between the student at the terminal and the central 
computer occurs across telephone lines* 

At the pre3t±nt time, general purpose microprocessor systems have 
become available, among them some designed especially for educational 
use* Unlike time-shared systems, each user of a microprocessor system 
has sole use of the processor* In a microprocessor system, the actual 
computer circuitry represents a small fraction of the total cost; the ^ 
most expensive components tend to be such items as the keyboard, disk 
drive, and display tubes* For this reason it is rarely economical to 
time-share microprocessor systems* Figure 5*3d illustrates a typical 
microprocessor configuration which might be employed for adaptive testing* 

Microprocessor systems have both advantages and disadvantages as 
compared to conventional time^^shared systems* The major disadvantages of 
microprocessor systems are that they do not yet have the storage capabilities 
typical of large-scale processors, and their computational power, although 
impressive, tends to be less than that of large scale systems* In 
effect, microprocessors are scaled down, compared to larger processors* 
The disadvantages of smaller-scale processors, however, are balanced by a 
number of important advantages* A major advantage of microprocessor 
systems is that they cost far less than large-scale systems* It is 
possible to purchase a microprocessor system with 48K bytes of memory 
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and a "floppy" disk for about $3,uuU, In contrast, the retail price of 
the PDP-li/40 computer used in the current study was approximately 
$150,000* It is evident that a large number of microprocessors can be 
purchased for the cost of a single large-scale system* 

A secozid advantage of microprocessor systsms is that hardware 
malfunction affects only a single testing station* In a time-shared 
system, fsilure of the central processor causes all terminals to stop 
operating, A third advantage of the microprocessor-based system is that 
its operating costs tend to be lower* Large-scale systems generally 
require the attention of an operator at the central site, whereas In micro- 
processor-based systems the user serves as operator* Additionally, since 
the microprocessor is located at the testing site, telephone lines are 
not needed for data transmission--a factor which can result in consider- 
able cost savings « 

Ac shown in Figure 5,3d, the microprocessor can support low-cost 
input/output devices which may facilitate testing* For example, the 
microprocessor can control a tape recorder or speech synthesizer to 
provide audio stimuli* A light pen would permit the testee to point to 
the chosen option or to part of a diagram* Image storage devices such as 
slide projectors or microfiche can provide randomly accessed graphics* 

Although microprocessors appear to offer considerable advantages for 
adaptive testing, it is not feasible simply to transfer the current CAT 
system to a microprocessor* One reason for this is that the storage 
capabilities of "floppy" disks ^^re considerably less than those of the 
disks curreatly used* Analysis needs to be performed in order to determine 
whether the storage capabilities of microprocessor systems are adequate 
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to meet the needs of an adaptive testing environment* In addition, 
analysis of the numerical algorithnjs employed in the current systems to 
determine their transferability to microprocessor systems needs to be 
undertaken* It is probable that 'current microprocessor systems will 
prove capable of supporting adaptive testing* Urry (1979) has demon- 
strated an adaptive verbal ability test which employs a microprocessor* 

Figure 5*3e illustrates the design of a system which employs a network 
consisting of a central host processor and remote microprocessor testing 
stations* In this system the microprocessor stations function as 
independent testing stations as in the microprocessor model (Figure 5*3d) 
but also have thti capability of two-way communication with the central , 
processor* Using this communications capability the microprocessor could 
transmit registration and item response data to the central computer for 
score reporting and item analysis* For example* a student might take 
an adaptive test which includes several expetimental items* The item 
responses would be transmitted to the central computer* Test: iteits could 
be achieved for subsequent score reporting and responses to the experi- 
mental items could be used for item analysis* Since the network 
could also support communication from the central processor to the 
microprocessor station* the central facility couLd transmit new tests 
to the microprocessor* 

It is evident that the microprocessor has considerable potential as 
a vehicle for adaptive testing* It is therefore recommended that the 
organization evaluate the potential of microprocessor-abased systems for 
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the delivery of adaptive testing* As part of these evaluations It Is 
suggestp^ that two hardirare technologies of high potential significance — 
bubble memory and Image processing—be examined* 

Bubble memory exploits a recently discovered property of certain 
crystallne materials* These materials have the characteristic that, when 
a wafer Is magnetised In a direction perpendicular to the plane of the 
largest surface, magnetic zones known as "bubbles^^ appear* Bubble 
memories can do everything that disks do; that Is, they can store large 
amounts of data and provide random access to any portion of the data* 
But unlike rotating magnetic disks, bubble memory performs these functions 
electronically rather than mechanically* Unlike disks, bubble memories 
have no motors, rotstlng magnetic surfaces, or moving heads and would 
offer significant advantages In an educational environment In which rough 
handling and a lack of trained maintenance personnel are the norm* 
Further, unlike conventional computer memories, bubble memory is nonvol- 
atile; the data In It are maintained even when power to the system Is 
turned off* It Is predicted that bubble devices will store data extremely 
accurately, and without loss, for over a century* Because of Its relatively 
low cost and high reliability, bubble memory appears to be a potential 
storage medium for Item banks* Some bubble memory chips are now commer- 
cially available and have been employed In text processing systems* 

A second technology of major Importance to adaptive testing Is Image 
processing* Traditionally, computers have been extremely useful for 
processing text niaterlal but have not had the capabilities of storing and 
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From Microelectronics by Robert N. Noyce. Copyright © 1977 
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Figure 5.3b Actual and Projected Cost/Bit of Computer Memory 
1973-1983 



From Microelectronics by Robert N. Noyce, Copyright ©1*177 
by Scientific American^ Inc. All rights reserved. 
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retrieving graphic Images at reasonable cost, Recently, however, Image 
processing devices have been designed that permit the storage of graphic 
material Including (In some cases) color and motion. Various linage 
processing technologies are available. The plasma display technology 
employed In the PLATO system Is one example. Video discs, because of 
their large storage capacity, random access, and ability to produce 
tuotlon, appear to have significant potential In testing environments. 
Recent advances In microform technology may also provide Inexpensive 
random access to graphic data* 

5,4 Recommendation 3: That extensions to Item response theory and the 
development of alternative models for the provision of adaptive testlnf^ 
be explored. Such models might Include It^ selection strateglea especially 
suited to microprocessors; multidimensional trait models;' models for 
achievement testing; and models for use diagnostic and mastery learning 
environments In which Items are linked to learning objectives, A related 
area of Importance Is the construction of adaptive test batteries In 
which branching occurs at the test level as well as the Item level * 
As was pointed out In Chapter 1, the Broad-Range Tailored Test 
Implements one of a number of designs which might have been chosen for 
adaptive testing, Mong alternatives to the maxlmum-*llkellhood estimation 
procedure are Bayeslan estimation procedures and the Weiss (1973) stradap- 
tlve procedure* Even within the context of the maximum likelihood 
estimation alternative stopping rules, different choices for the Initial 



ltem» variations ip step size, and alternative sti'tjtctures for the item 
pool could have been employed* There ara many research issues which 
deserve exploration* Although a few will be mentioned in this document, 
it is recommended that the input of interested staff be solicited regarding 
future developments in this area* 

One area of interest is the de'^elopment of \nui£ic*imensional models* 
The unidimensionality assumption of latent trait theory has bee^* taken to 
be a serious constraint by some researchers; this is particularly true in 
the domain of achievement testing in which the test typically involves a 
multidimensional space* Sympson (1977) has developed a multiclmensional 
latent trait model for dichotomously scored multiple^choice itentj* Ujiry 
(1977) has developed a multidimensional Bayesian approach to tailored 
testing* Many questions remain to be answered about the appropriateness 
of multidimensional models and of the most effective computational 
techniques for their use* 

Another area of potential interest ^s the development o£ item 
Selection strategies especially suited to microprocessors* Jones (1979, 
unpublished manuscript) has ar^^ad that strategies based on sequential 
analysis may be pressed into service for use In adaptive testing with 
considerable savings in computational time* 

Most of the work in adaptive testing has involved the estimation of 
ability* In a mastery learning environment the estimation of achievement 
is generally far more important* Research needs to be conducted Into the 
development of adaptive testing models vhlch can be used for measuring 
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achlevementj especially models which link items to learning objectives* 
It is possible that such models could employ item response theory in 
unidimensional or multidimensional models* However, it would be useful 
to explore non-trait models. The concept that individuals possess fixed 
traits has been challenged by such social learning theorists as Bandura 
and Mischelj who argue for alternative constructs such as response 
tendency and state* True score models may not be consistent with a 
social learning perspective which stresses situational variables* 
Social learning cK)dels place an increased emphasis on individual differ*- 
ences* All too often individual differences have been considered "error,*' 
even though individual variation may be a valuable important source of 
information about a person* Since (unlike a paper^-and-'pencil instrument), 
the computer has the capability of adapting to individual differences* 
individual difference models may play a key role in computerized testing* 
Tests which yield detailed individualized profiles for diagnosis and 
educational prescription are desirable* Unfortunately, comprehensive 
profiles are difficult to construct through conventional tests because of 
the large number of items needed to obtain reliable scores* Adaptive 
testing will be useful in areas where comprehensive individual profiles 
are desired* Thus, if a dichotomous classification (select vs* reject, 
remedial vs* standard) were to be made on the basis of a pr^established 
*'cut score," a conventional test might be most appropriate* However, if 
the purpose of the test were to design an individualized educational 
program or to place an individual in an appropriate vocational training 
program, an adaptive test would tend to be more effective* In developing 
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proflles, the computer can present batteries of adaptive tests In which 
branching would take place among subtests as well as among Items* As 
with single tests, adaptive batteries can be conceptualised In both 
unl dimensional and raultidlmentlonal form* 

Becausii of the multiplicity of the research Issues and the limita- 
tions of the resources available. Recommendation 1 suggests that research 
priorities bfe guided by development priorities* Much of the research 
proposed can take place In the context of developing adaptive tests for 
use outside the laboratory* Some laboratory research Is desirable 
because of the greater possibilities for e:cperlmental control; thus, 
analytic and simulation models may be cost-effective techniques for model 
development In the early stages* It should also be noted that adaptive 
testing research can provide useful Insights Into the construction of 
paper-and -pencil tests* There Is little reason for Ifolatlng adaptive 
testing research from the mainstream of psychometric research since there 
Is much to be gained from Its Integration* 

5*5 Recommendation A: That high priority be accorded to the development 
of Innovative Item types for computer presentation* Such Items might 
Involve simulation and gamlnj^, constructed responses, graphlcst motion, 
sounds and tlme*dependent responses 

The objective multiple-choice Item has been the mainstay of testing 
for many years* So entrenched Is this format that some people might be 
tempted to conclude It Is the most desirable* In fact, the multiple- 
choice objective Item has many advantages. Including Its standardisation. 
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the limited range of response possibilities it allows, it? fit with 
psychometric models, and the ease with which it may be score<?* However, 
in many respects the objective multiple-choice item is an artifact of the 
economics involved in mass testing of large nuTiibers of p*^rton&* As 
a measurement option, the multiple-^choice item has a number of limitations* 
One limitation is that the taajor cognitive process measured by *:he 
multiple choice item is the individual's ability to discriminate a 
correct response from among a series of alternatives* It is vail estab- 
lished that the psychological processes Involved in recognition are 
different from other important processes such as recall, synthesis, and 
evaluation which the objective multipie^-choice item can only measure 
indirectly* Because the multiple-choice item cannot readily measure 
divergent responses, it is limited in its ability to as'^ess problem 
solving* 

The computer has the potential to free the test developer from the 
constraints imposed by a multiple-choice format* Many novel item fojrmats 
are possible* Items presented by computer may employ constructed responses* 
Probabilistic response models in which an individual weights different 
alternatives may be employed* Items may be constructed that require 
time-*dependent responses; for example, an individual may be asked to 
listen to a conversation and press a button when a grammatical error 
occurs* Items may employ graphics; for example, mathematical concepts 
may be tested by asking the individusl's being asked to group objects, 
draw lines, or construct angles* Process conceptions may be tested by 
our asking the test taker to follow the flow of a process diagram with a 
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light pen* The possibilities of computer-administered Items that have 

the potential to support novel forms of testing go far beyond the capablll* 

ties of the objective multiple-choice Item* 

It Is recommended that the organization accord high priority to the 
exploration of computer-presented Items because new technique's may form 
the basis for novel assessment procedures* Weiss (1977) has predicted 
that **the multlple-^cholce Item will disappear and exist only In museums* 
We will learn how to use graded responses, continuous responses, and free 
responses; and In the process we will humanize testing even a little bit 
more, not only by adapting the test to Individual differences and abilities 
and other variables, but also by allowing people to respond In a , 
more natural way than Is allowed by multiple choice tests*** 

As discussed In Recommendation 1* the development of creative 
solutions requires synergy among experts In test development, computer 
technology, psychometrlcs, and education* Few test development profes- 
sionals have sufficient knowledge of capabilities of computers to evaluate 
the potential of this technology- Technically oriented persons rarely 
understand the subtleties of test developinent * Psychometric support Is 
needed to develop models for scoring and Interpretation of novel assessment 
procedures and to Insure that new techniques developed rest on a firm 
theoretical foundation* Finally, It Is Ijnportant that psychologists and 
educational specialists be Involved so that the assessment techniques 
developed bear a relationship to real problems and real people* The 
Importance of a multldlsclpllnary approach cannot be overemphasized* 
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Chapter VI 

Preliminary Study 

Before conducting the main study, It was necessary to Investigate 

the effectiveness of the human/computer Interface* Accordingly, a 

pilot study was conducted: 

1* To determine if students of various levels and ages could 

Interact with the computer system easily and without confusion* 

2* To determine If modifications to the system protocols would 
yield a more effective human/computer Interface* 

3* To determine If the length of time required to administer 

the BRIT and the extent to which fatigue factors would affect 
performance when both forms were administered* 

4* To determine If practice effects would systematically bias 
the relationship between the first and second form admin-^ 
Istered* 

5* To refine the posttest questionnaire on the basis of student 
feedback* 

6* To refine the Instruction given to the students* 
Students participating In the preliminary study were 5th grade 
students (tl=3) , 7th grade students (H«3), high school sophomores (tl=ii), 
and adults (bl»6) * 

The high school students, the 7th grade students, and three of 
the adults were given both forms of the BRTT* The remainder of the 
subjects were given a single form of the BRTT* The time required to 
take each form of the BRTT itas noted as were technical difficulties, 
requests for help, and apparent ease of the human/computer protocol* 
Following the tests, the subjects were asked their opinions of the 
experience* The high school students were asked to respond to a formal 
set of questions and to participate In a unstructured discussion group 
In which reactions to the testing experience were discussed* 




'197^ 



The subjects tjer-a able to complete the tests in a relatively 
short time. The lEean tline for administration of Form A of the BRTT 
was 18 minutes; for Form B it wa? 18,2 minutes. Only one subject (a 
fifth grader) required iicre than 25 minutes to complete the test. For 
the 11 high school students, ability estimates ranged from ,10 to 1,83; 
difference scores ranged from a low of ,08 to a high of ,68, The test- 
retest correlation for the high school students was ,53, 

The students generally felt that usiag the terminal rather than 
a paper-^and-^pencil test was more enjoyable. They felt the terminal to be 
less fatiguing and generally a beneficial experience. More than half of 
the students said that they were relaxed during the testing situation. 
It was observed that most students preferred the idea that answer sheets, 
test booklets, etc, were not necessary. 

Some students noted that they felt pressured whenever someone 
completed the test ahead of them, (Students were assured, however, that 
this was not a timed test)* Others mentioned that the '^busser*^ that 
sounded when an error was made was startling, 

A common complaint was that the letter *^g*' on the DEC VT52 terminal 
was hard to read. Unfortunately, this was a hardware function and could 
not be changed. 

One case of hardware failure occurred. This was determined to be 
due to static electricity resulting from a chair's friction with the 
carpet. 

Following are the high school students' responses to the posttest 
questionnaire. 
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Table 6,1a 

Responses of High School Students 
lo Posttest Questions 



Item 



A* Previous Experience with Computers 



Response 



It Are you at all familiar with computers? 



2, Have you ever punched computer cards at 
a keypunch machine before? 



3* Have you ever interacted with a computer 
by means of a terminal before? 



Yes 
No 



Yes 
No 



\e3 
No 



Perception of Test Difficulty 



4, How often did you feel that the questions 
in the test were too easy for you? 



St How often did you feel that the questions 
in the test were too hard for you? 



6* On how rrany of the questions did you 
guess? 



7* How often were you sure that your 

answer to the questions were correct? 



Always 

Frequently 

Sometimes 

Seldom 

Never 



Always 
Frequently 
Somet imes 
Seldom 
Never 



Almost all 
More than half 
About half 
Less than half 
Almost none 



Almost always 
More than half 
About half 
Less than half 
Almost never 



21 
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Table 6.1a (continued) 



I tem 

— 


Response 


// 


% 


8. In relnLiot\ to vour vocabulary ability, 


Too difficult 


0 


0 


how difficult was the test for vou? 


Somewhat difficult 


9 


82 




About right 




1 0 

it} 




Somewhat eas t 


u 


U 




Too easy 


0 


U 


9. Did Vou feel frustrated bv the 


Not at all 


3 


27 


difficulty of the te^t fluestioius? 


Somewhat 


/ 


04 




Fairly much so 


1 


9 




Very much so 


U 


u 


C- Anxiotv and Mt>tivation 


Response 


it 
ir 




10. Duriiv^'. tosriu^i, did vou worr\ about 


"^ot at all 


1 


Q 


how wcM you Would do? 


Somewhat 


8 


73 




Fairly much so 


2 


18 




V e r y mu c h 


n 

u 




11. \;ere vo:i nervt>ui; wliile taking the test? 


Not at ail 


n 

y 






Somewhat 


1 


9 




Moderately so 


1 


9 




Very much so 


U 


U 


12. [iou did vou feol while Leaking the test? 


Very tense 


U 


U 




Somewhat tense 


2 


18 




Neutral 


1 


9 




Sonewhat relaxed 


5 


45 




Very relaxed 


3 


27 


I'i. hid norv<uifsnt^iii; while taking the test 


.Dif initely 


0 


0 


nrevent vou from doing your best? 


Somewhat 


1 


9 




Probably not 


6 


54 




Definitely not 


4 


36 


lA. How freouentlv were vou careful to 


Almost always 


6 


54 


selot t wliat Vou thou^^ht wai> the best 


Frequently 


5 


45 


anfswer to eaoli question? 


Sometimes 


0 


0 




Rarely 


0 


0 




Never 


0 


0 



2^ ^ 
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Table 6.1a (continued) 



Item 




n 

It 


7 


15. 


Did you feel challenged to do as well 


Not at all 


0 


0 




as you could on the test? 


Somewhat 


3 


27 




Fairly much 










Very much 


3 


27 


16. 


Did you care how well you did on the 


Yes, a lot 




27 




test? 


Yes, a little 


Q 


73 






A little 


0 


0 






Very little 


0 


0 






Not at all 




ft 


D. 


Factors Related to Computer Admiaistration 






1 7 i 


Did the mechanics of using the comouter 


Not at all 


8 


73 




terminal interfere with your taking 


Slightly 


3 


27 




the test? 


Somewhat 


ft 


ft 






Very much 


0 


0 


1 Q 


How tiring did you find the computer- 


Very 


0 


0 




administered test? 


Somewhat 


1 


9 






Slightly 




36 






Not at all 


6 


54 


19. 


Compared to a "p^per-and-pencil" 










multiole-choice test of the same 










length would you: 










Find the computer test more tiring? 




0 


0 




Find both tests about the same? 




1 


9 




Find the paper-and-pencil test more 




10 


91 




tiring? 








20. 


Which would you prefer to take? 










A computer-administered test 




9 


82 




A pencil'-and'-paper test 




0 


0 




No preference 




2 


18 
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Table 6.1 a (continued) 



Item 


n 




E. 


Desirability of Item Feedback 








The computer could score each item as you answer it and 
tell you if your choice was right or wrong. This is 
called feedback. 






2U 


Would getting feedback on the test 
make it: 








More intererting 
Less interesting 
Cannot say 


8 
1 

L 

2 


73 
q 

18 


22. 


Would getting feedback after each 
question make you nervous? 








Very nervous 
Somewhat nervous 
Slightly nervous 
Wet nervous at all 


1 

L 

2 
5 


18 

27 


23. 


How would you feel about getting 
feedback? 








T would rather not know whether my 
ans\i7er^ were right or wrong. 




27 




I really don't care if I got ff^dback 
or not. 


1 


9 




I would like getting feedback. 


7 


64 



Appendix A 
Posttest Questionnaire 

The posttest questionnaire was designed to obtain demographic data 
regarding the student population and attltudlnal data about student 
reaction to the adaptive test* The results of the questionnaire are 
reported In Chapter IV* Responses to the pilot version of the questionnaire 
will be found In Chapter VI, which describes the preliminary study* This 
appendix contains the questionnaire In Its original fom* The attitude 
variables assessed by the questionnaire are: prior familiarity with 
computers, subjective perception of difficulty, anxiety, motivation, 
huDian factors In the computer/human Interface, preference for adaptive 
vs* conventional testing, and feedback* To facilitate comparison with 
previous research, the Items on the questionnaire were adapted from Items 
us^d by previous Investigators* 

Most of the questions were of the objective multiple-choice type* 
Three of the items ( V~^0) x^ui^*'^ ^\ ^r- = - ►►►i^cte sentence blank format 
that permitted free respoube* 
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Please provide for us the following background information. Your responses 
will be kept strictly confidential* If you strongly object to answer-Lng any 
question please feel free to omit it. 

Uzme 

Date of Birth 

Your Sex Maj.e Female 

High School Attended 

Years in School 

Have vn*,t taken the ?SAT? Yes No 

If yes^ when did yoM take it? 

Have you taken the SAT? Yes No 

If yes, when did you take it? 

Are you planning to take tuL SAT at a future time? Yes No 

If yes, when 

What is your High School. average? 

What Is the highest possible average at your school? 

What are your plans when you graduate from High School? 

attend 4 year college 

attend 2 year college 

work 

other 

Are you at all familiar with computers? Yes No 

Hnve you ever punched computer cards at a keypunch machine before? 

Yes No 
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Have you ever interacted with a computer by means of a terminal before? 

Yes Wo 



How often did you feel that the questions in the test were too easy for 
you? 



a* Always 

b. Frequently 

c. Sometimes 
d* Seldom 

e. Never 



How often did you feex that the questions in the test were too hard for 
you? a. Always 

b. Frequently 

Sometimes 

d. Seldom 

e. Never 

On how many of the questions did you guess? 

a. Alniost all of the quest' ns 

b. More than half of the questions 

c. About half the questions 

d. Less than half of the questions 

e* Almost none of the questions or never 

How often were you sure that your answers to the questions were correct 

a. Almost always 

b. Mote than half of the time 

c* About half of the time 

d. Less than half of the time 

e* Almost never 
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In relation to your vocabulary ability, how difficult was the test for you? 

a. Much too difficult 

b* Somewhat too difficult 

c. Just about right 

d. Somewhat too easy 

e. Much too easy 

Did you feel frustrated by the difficulty of the test questions? 

a. Not at all 

b* Somewhat 

c* Fairly much so 

d* Very much so 

During testing, did you worry about how well you would do? 

a. Not at all 

b* Somewhat 

c. Fairly much so 

d. Very much 

Were you nervous while taking the test? 

a. Not at all 

b. Somewhat 

c. Moderately so 

d* Very much so 

How did you feel while taking the test? 

a. Very terse 

b. Somewhat tense 

c. Neither tense nor relaxed 

d. Somewhat relaxed 

e. Very relaxed 



o 22: 
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Did nervousness while taking the test prevent you from doing your best? 

a* Yes, definitely 

b* Yes, somewhat 

c* Probably not 

d* Definitely not 

How frequently were you careful to select what you thought was the best 
answer to each question? 

a* Almost always 

b* Frequently 

c* Sometimes 

d* Rarely 

e* Never 

Did you feel challenged to do as well as you could on the test? 

a. Not at all 

b. Somewhat 

c* Fairly much so 

d* Very much so 

Did you care how well you did on the test? 

a. I cared a lot 

b. I cared some 

c* I cared a little 

d. I cared very little 

e. I didn't care at all 

Did the mechanics of using the computer terminal Interfere with your 
taking the test: 

a. Not at all 

b* Slightly 

c* Somewhat 



O Very much so 

ERIC 
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How tiring did you find the computer-administered test? 

a. Very tiring 

Somewhat tiring 

c* Slightly tiring 

d* Not tiring at all 

Compared co a ''paper-and-pencil** multiple-^choice test of the same length 
would you 

a. Find the computer test more tiring? 

b. Find both tests about the same? 

c* Find the paper- and-pencil test more tiring? 

l^ich would you prefer to take? 

a* A computer-administered test 

b* A pencil- and- pap or test ^ 

c. No preference 



The computer could score each item as you answer it and tell you if your 
choice was right or wrong* This is called feedback * 

Would getting feedback on the test make it; 

a. More interesting 

b* Less interesting 

c, Cannot say 

Would glutting feedback after each question make you nervous? 

a, Very nervous 

b. Somewhat nervous 

c. Slightly nervous 

d. Not nervous at all 
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How would you feel about getting feedback? 

a, I would rather not know whether my answers were right or 

wrong- 

b- I really don't care If I got feedback or not, 

I would like getting feedback. 

What did you like best about the computer-administered test? 



What did you like least? 



How could we change the test to improve it? 



Appendix B 



Description of the Item Pool 



ERIC 
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SUMMARY SHEET 



ITEM TYPE 

//I - Synonyms 

2 - OPposites 

3 - Incomplete Sentences 

4 - Word Relations 

5 - Sentence Comprehension 

SCAT I (form 2A, 2B, 3A, 3B, 4A) contributed 65 items. The items 
consisted of synonyms that used instruction set (1), and incomplete 
sentences chat used instruction set (5), 

SCAT II (form lA, 2A, 2B, 3A, 3B, 4A) contributed 107 items. All 
items were wore! relations which utilized instruction set (9)* 

S TKP I I contributed 39 items. All items were sentence comprehension. 
Instruction set (13) was used. 

PSAJ contributed 56 Jtems. Opposites, incomplete sentences, and word 
relations were the various tvnes that were employed. Instruction 
set (3) was used with items that were opposites, (7) with incomplete 
sentences* iind (11) with word relations. 

!>AT contributed 27 items. Tyne 2, 3* and 4 were used. Instruction 
set (3), (7), and (U) were used respectively. 

GRR contributed 69 items. Type 2, 3, and ^- Instruction set (3), (7), 
and (11) wer<? used r<?spectively. 
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Item So»,.rce 
Test & Form Name Item// 



u ^ 
u o 

a: 



d 
o 



u 
u 



u 

(ft a> 



I PA 



u 

(ft w 



Item Form ^ 5 
IPC Type Code Line// INDEX 



SCAT I 3A 



SCAT I 3B 
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37 


35.1 




A 


1.51 


-.42 


.16 




53 


22 


2-6 


42 


352 




B 


1.91 


-.35 


.25 




53 


22 


2-6 


22 


353 




A 


1.60 


-.98 


.14 




53 


22 


2-6 


13 


354 




E 


.71 


-1.28 


.14 




53 


22 


2-6 


1 


355 




E 


.51 


-3.81 


.14 




53 


22 


2-6 




356 




D 


.73 


-2.46 


. 14 




53 


22 


2-6 


3 


357 




C 


.90 


-2.74 


.14 




53 


22 


2-6 


77Q 
it t J 


J JO 


J 


n 


1 • DO 


1 Q 






^ J 




J ^ 


304 


359 


5 


B 


1.56 


.63 


.15 


3 


53 


22 


3-7 


308 


360 


5 


E 


.92 


.69 


.14 


3 


53 


22 


4-8 


234 


361 


5 


E 


.72 


-.68 


.14 


3 


53 


22 


3-7 


254 


362 


5 


A 


1.09 


-.26 


. 14 


3 


53 


22 


5-9 


238 


363 


5 


D 


1.0 


-.54 


.14 


3 


53 


22 


4-8 


226 


364 


5 


B 


1.58 


-.92 


.10 


3 


53 


22 


4-8 


205 


365 


5 


E 


.77 


-1.99 


.14 


3 


53 


22 


4-8 


59 


301 




D 


1.08 


-.02 


.15 






17 


2-6 


47 


302 




E 


1.67 


-.27 


.19 




54 


17 


2-6 


14 


303 




D 


1.28 


-1.16 


.15 




54 


17 


2-6 


6 


304 




A 


.60 


-2.07 


.15 




54 


17 


2-6 


11 


305 




E 


1.34 


-1.56 


. 15 




54 


^7 


2-6 


26 


306 




B 


1.30 


-.91 


.15 




54 


17 


2-6 


5 


307 




B 


.76 


-2.26 


.15 




54 




2-6 


12 


308 




E 


1.28 


-1.39 


.15 




54 




2-6 


2 


309 


^ 


A 


.59 


-2.83 


15 




';4 






358 


310 


5 


D 


1.27 


1.42 


.10 


3 


54 




4-8 


246 


311 


5 


B 


.86 


-.38 


.15 


3 


54 




4-8 


269 


312 


5 


E 


.45 


.07 


.15 


3 


54 




3-7 


253 


313 


5 


E 


1.50 


-.22 


.10 


3 


54 




3-7 


250 


314 


5 


B 


1.63 


-.32 


.15 


3 


54 




3-7 


213 


315 


5 


B 


1.36 


-1.49 


.15 


3 


54 




4-8 


209 


316 


5 


B 


.66 


-1.88 


.15 


3 


54 




4-8 


207 


317 


5 


C 


1.33 


-1.93 


.15 


3 


54 




3-7 



i3A 



13B 
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Item Source 2 1 u « 







c 








0 






















u 


o 












u 








0 


B 




u 


u 












c 




a: 









-a 



5 >^ 



Test & Korm Nome || |- |^ Item ^?oL § 

Uem// IPA IPB IPC Ty_pG Code ^ ^ Linog INOLX 



7AS 


135 ' 


9 


c 


I,AS 


2.97 


.11 


A 


55 


19 


2-5 II lA 


757 


136 


9 


B 


• SA 


3.35 


.31 


A 


55 


19 


2-5 


7A1 


137 


9 


c 


1 A8 


2.51 


. 17 


A 


55 


1 9 


2-5 


7 37 


138 


9 


A 


• 67 


2. AO 








19 


2-5 


71A 


139 


9 


r 


66 


1 41 


?0 




t;S 

D J 


19 


2-5 


729 


lAO 


9 


c 


1, 35 


2^23 


• 2A 




c. C 

D D 


19 


2-5 


7A7 


lAl 


9 


B 


1»AS 


2,85 


• 2S 




55 




2-5 


7A3 


U2 


9 


A 


77 


2* 5S 


• 23 


A 


55 


19 


2-5 


716 


1A3 


9 




1, 04 


1^93 


• 17 


A 


55 


19 


2-5 


723 


lAA 


9 


D 


• 62 


2.06 


• 20 


A 


55 


19 


2-5 


68A 


1A5 


9 


D 


• 82 


1,26 


• 09 


A 


55 


19 


2-5 


ooZ 




y 


B 


.99 


1. 2j 


• 09 




55 


1 d 


I- 3 


667 


JA7 


9 


A 


■.SA 


1.03 


• 17 


h 


55 


19 


2-5 


655 


US 


9 


B 


.9A 


.32 


• 09 


h 


55 


19 


2-5 


606 


U9 


9 


A 


.57 


.23 


• 17 


h 


55 


19 


2-5 


679 


150 


9 


B 


.31 


1.20 


• 17 


h 


55 


19 


2-5 


A72 


151 


9 


D 


.72 


-I.A7 


• 17 


h 


55 


19 


2-5 


570 


152 


9 


A 


l.OA 


-.05 


• 17 


U 


55 


19 


2-5 


ASA 


153 


9 


A 


.72 


-1.29 


• 17 


h 


55 


19 


2-5 



SCAT II 2A 
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692 


15A 


9 


B 


1.74 


1.43 


.06 


4 


56 


9 


2-5 II :a 


711 


155 


9 


C 


1.60 


1.31 


.17 


4 


56 


9 


2-5 


696 


156 


9 


D 


1.07 


1.52 


.20 


4 


56 




2-5 


638 


157 


9 


B 


.92 


1.34 


.20 


4 


56 


9 


2-5 


672 


158 


9 


A 


.69 


1.12 


.22 


4 


56 


9 


2-5 


650 


159 


9 


C 


.84 


.74 


.24 


4 


56 


9 


2-5 


615 


160 


9 


A 


.52 


.41 


.20 


4 


56 


9 


2-5 


560 


161 


9 


B 


.70 


-.15 


.24 


4 


56 


9 


2-5 


A59 


162 


9 


A 


.75 - 


-1.63 


.20 


4 


56 


9 


2-5 


7A0 


16J 


9 


D 


1.36 


2.45 


.12 


4 


57 


15 


2-5 


733 


164 


9 


A 


.75 


2.41 


. 15 


4 


57 


15 


2-5 


706 


165 


9 


D 


1.26 


1.65 


.27 


4 


57 


15 


2-5 


700 


lb6 


9 


D 


1.05 


1.60 


.11 


4 


57 


15 


2-5 


674 


167 


9 


B 


1.20 


1.17 


.20 


4 


57 


15 


2-5 
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STEP II 3B 



STEP II 



865 


262 


13 


B 


1.08 


- .05 


1 ^ 


870 


263 




D 


1.45 


.12 


.20 


856 


264 




B 


1. 62 


- .27 


.31 


806 


265 




C 


1.26 


-1. 65 


.21 


824 


266 




C 


1.15 


-1.07 


.20 


859 


267 




B 


.92 


- .20 


.23 


815 


268 




B 


1.36 


-1.34 


.20 


780 


269 




B 


.63 


-2.66 


.20 


809 


270 




A 


.97 


-1.50 


.20 


772 


271 




C 


.75 


-3.19 


.20 


791 


272 




B 


.62 


-2.06 


.20 


794 


273 




A 


.69 


-:.9i 


.20 


779 


274 




B 


.80 


-2.72 


.20 


804 


275 




A 


1.00 


-1.68 


.20 



64 14 5-8 II 3B 



7-10 

6-9 
5-8 



4- 7 

5- 8 

4- 7 

5- 8 



4- 7 

5- 8 

6- 9 



II 4A 



868 


242 


C 


1.39 


.06 


.22 


65 


20 5-8 


844 


243 


D 


1.05 


- .68 


.22 






785 


244 


C 


.97 


-2.44 


.27 






773 


245 


C 


.94 


-3.19 


.15 




6-9 


781 


246 


D 


.81 


-2.54 


.2 






784 


247 


A 


.82 


-2.46 


.24 




5-8 


774 


248 


A 


.98 


-3.13 


.2 




4-7 


771 


249 


D 


.79 


-3.42 


.2 






776 


250 


C 


.90 


-3.01 


.23 






778 


251 


A 


.66 


-2.78 


.2 




5-8 


782 


252 


B 


.64 


-2.52 


.24 




4-7 


770 


253 


C 


1.16 


-3.57 


.2 




5-8 


765 


254 


D 


.66 


-4.13 


.2 






767 


255 


C 


1.29 


-3.87 


.2 






769 


256 


B 


.92 


-3.60 


.2 






768 


257 


D 


.78 


-3.83 


.15 






766 


258 


D 


.78 


-3.87 


.2 




4-7 


764 


259 


C 


.70 


-4.40 


.2 




6-9 


762 


260 


C 


.78 


-4.79 


.2 






763 


261 


A 


.58 


-4.73 


.2 
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Item Source 

Test and Form 

Name Item S Record 



STEP II 2A 



STEP II 2B 



STEP III 3A 
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Instr. 
Set 

jf S Answer Test 9 

Key IPA IPB IPC Item Form Itms 

Type Code Cont Line INDEX 



SCAT II 4A 413 236 9 C .32 -5.26 .20 4 60 35 2-5 II 4A 



413 


236 


9 


C 


.32 


-5.26 


.20 


4 


60 35 2-5 


415 


237 


9 


D 


.76 


-4.83 


.20 


4 


60 2-5 


426 


238 


9 


B 


.45 


-3.00 


.20 


4 


60 2-5 


412 


239 


9 


C 


.60 


-5.45 


.20 


4 


60 2-5 


410 


240 


9 


B 


.54 


-5.53 


.20 


4 


60 2-5 


411 


241 


9 


C 


.51 


-5.48 


.20 


4 


60 2-5 


893 


293 


13 


D 


.57 


1.11 


.15 


5 


61 8 5-8 


897 


294 




B 


1.12 


1.53 


.32 




6-9 


874 


295 




A 


1.74 


.28 


.24 






879 


296 




B 


.89 


.60 


.24 




5-8 


848 


297 
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4 


719 


16 




B 


.57 


1.97 


.17 




753 


17 




D 


.55 


3.35 


.17 




750 


18 




B 


.53 


3.06 


.17 




756 


19 




D 


.74 


3.82 


.19 




758 


20 




E 


.59 


3.87 


. 17 




U8 


21 


3 


B 


.84 


1.37 


.17 


2 


199 


22 


3 


D 


.95 


4.21 


.20 


2 


192 


23 




E 


1.65 


3.59 


.19 




197 


24 




E 


1.51 


3.87 


.09 




200 


25 




A 


.97 


4.66 


.15 




198 


26 




B 


.90 


4.12 


.20 




335 


27 


7 


E 


.8/i 


1. 11 


.15 


3 


389 


28 




D 


.64 


2.66 


.17 




407 


29 




E 


.74 


3.86 


.07 




396 


30 




D 


.84 


2.91 


.17 




382 


31 




A 


.82 


2.17 


.17 




392 


32 




B 


.55 


2.78 


.17 




744 


33 


II 


A 


.63 


2.58 


.16 


4 


742 


34 




C 


.78 


2.55 


.03 




735 


35 




E 


.69 


2.37 


.16 




749 


36 




E 


1. 13 


3.04 


.27 




754 


37 




A 


1.02 


3.44 


.22 




lOi. 


Jo 


J 




.59 




. 19 


2 


163 


39 




B 


1.40 


2.08 


.27 


2 


187 


40 




E 


.77 


3. 16 


.26 




184 


41 




C 


1. 01 


2.86 


.19 




183 


42 




A 


.59 


2.80 


.16 




191 


43 




E 


.57 


3.49 


.19 





71 



72 



30 2-6 

5-9 

3- 7 

4- 8 

5- 9 
3-7 
2-6 



GI 



6-10 

4- 8 

5- 9 
3-7 
5-9 

39 2-6 G2 
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INSTRUCTION SETS 



INSTRUCTION SET NO. 



TEXT 



1 with 5 lines 



2 with 2 lines 

3 with 4 lines 



4 with 2 lines 

5 with 6 lines 



6 with 2 lines 

7 with 5 lines 



This question has one word followed by 
five words or phrases lettered A,B,C,D and E. 
Read the word. Then pick the lettere^ ^ord or 
phrase that has the same or almost the same 
meaning . 

Pick the word or phrase that has the same 
or almost the same meaning as the first word. 

This question has one word followed by 5 words 
or phrases lettered A through £. Read the word* 
Then pick the lettered word or phrase that is 
most nearly opposite in meaning. 

Pick the word or phrase that is most nearly 
opposite in meaning to the first word. 

This question has a sentence in which one 
word is missing; a blank space indicates where 
the word has been removed from the sentence. 
Beneath the sentence are five words lettered 
A, B, C} and E one of which is the missing 
word. You are to select the missing word by 
deciding which one of the five words best fits 
in with the meaning of the sentence. 

Select the missing word which best fits in 
with the meaning of the sentence. 

This sentence has one or more blank spaces^ 
each blank indicating that a word has been 
omitted. Beneath the sentence are 5 lettered 
words or sets of words. You are to choose the 
one word or set of words which when inserted 
in the sentence, best fits in with the meaning 
of the sentence as a whole. 
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INSTRUCTION SET NO. 



TEXT 



8 with 2 lines 



9 with 6 lines 



10 with 3 lines 



II with 5 lines 



l2 with 3 lines 



13 with 5 lines 



14 with 5 lines 



Select the word or set of words which best 
completes the following sentence. 

This question begins with 2 words • These two 
words go together in a certain way* Under them 
there are 4 other pairs of words lettered 
A, B, C, D* Find the lettered pair of words 
that go together In the same way as the first 
pair of vords. 

Find the lettered pair of words that go together 
in the same way as the first pair of words. 

In this question a related pair of words or 
phrases is followed by 5 lettered pairs of words 
or phrases* Select the lettered pair which best 
expresses a relationship similar to that expressed 
in the original pair. 

Select the lettered pair of words or phrases 
which best expresses a relationship similar to 
that expressed in- the original pair. 

In this question, the first sentence Is followed 
by an incomplete statement and 4 suggested answers, 
lettered A, B, C, and You are to decide which 
one of these answers is best. Your choice should 
be based on what the first sentence says. 

In this question, the first sentence Is followed 
by an incomplete statement and 4 suggested answers 
lettered A, B, C, D, You are to decide which one 
of these answers is best. Your choice should be 
based on what the first sentence says. 



Appendix C 
Item Selection Tables 



ERIC 



Following are the Item selection tables employed In 
Forms A and B of the BRTT* Together with the item data 
reported In Appendix B they constitute a description of 
the test as used In the current study* 
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